On Wed, Oct 14, 2015 at 3:28 AM, Taylan Ulrich Bayırlı/Kammer <xxxxxx@gmail.com> wrote:
Kevin Wortman <xxxxxx@gmail.com> writes:

>     Then please propose an API based on one of those which solves our
>     problem, without having any of the mentioned issues. I haven't
>     been able to come up with a satisfactory one.
>
>
> Well, IIRC Alex Shinn has already proposed a procedure-based API and
> reiterated it several times.

If you mean the proposal of changing the signature of hash functions to
hash(object, bound, seed), I explained some issues with that in previous
mails on the list.  (Please correct me if any of my explanations was
wrong.)

The explanation as I understand is simply that existing
implementations don't take seed arguments.  This seems
a non-issue to me since as I explained, the lazy implementor
can simply provide a wrapper which ignores the seed.  Once
you add the seed argument internally, however, you get all
the benefits of security and multi-hashing which you didn't
have before.

> And the only complaint lodged against using a parameter object seems
> to be that referencing it costs one procedure call, which seems
> perfectly acceptable for an infrequent operation in a
> functional-leaning language.

I would currently summarize my main complaints as:

- Things will crash and burn when a hash table is operated on from two
  dynamic extents with different values for the hash salt.

John Cowan previously suggested a parameter, and I
consider that idea broken for this very reason.  The seed
should be hidden from the user.  However, your proposal
of making every custom hash function author maintain
their own seed handling logic is similarly a recipe for bugs.

- Since the only real use-case is determinism for unit tests

No, unit tests are a minor use case.  Any tests involving
hash tables should be written to not depend on the order
of elements in the table.  The only exception is testing the
hash functions themselves, for which you can't do much
better than:

  (test <expected> (my-hash obj seed bound))

for several values.  Without an explicit seed you need
some way to specify the global value.  An env variable
is a poor way to handle this, since it places additional
restrictions on the test running framework (e.g. there
is currently no way in Snow to specify env vars to be
set when running tests).

But the primary motivation for determinism is in scientific
computing, where you want resumable and/or verifiable
results.  One important case is mapreduce, where when
running a long computation on thousands of machines
at once, hardware failure is the norm rather than the
exception.  This can be addressed by running every
computation twice, and rerunning when the results differ.
As in tests, you can take effort to sort outputs and
explicitly remove dependence on hash order, but you
may be using third-party libraries which you can't change,
or may not be able to afford the extra computation.
This is where deterministic hashing can be useful.

And all of this is a minor point compared to the fact
that the current SRFI-126 proposal is ugly, makes
more work for everybody, and makes custom hashes
second class citizens.  I have no interest in this API.

-- 
Alex