Email list hosting service & mailing list manager

seeds Alex Shinn (08 Oct 2015 05:03 UTC)
Re: seeds taylanbayirli@xxxxxx (08 Oct 2015 07:55 UTC)
Re: seeds Arthur A. Gleckler (08 Oct 2015 17:46 UTC)
Re: seeds taylanbayirli@xxxxxx (08 Oct 2015 19:38 UTC)
Re: seeds Arthur A. Gleckler (08 Oct 2015 21:47 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 08:09 UTC)
Re: seeds Kevin Wortman (09 Oct 2015 17:18 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 18:33 UTC)
Re: seeds Kevin Wortman (13 Oct 2015 17:36 UTC)
Re: seeds taylanbayirli@xxxxxx (13 Oct 2015 18:28 UTC)
Re: seeds Alex Shinn (14 Oct 2015 01:59 UTC)
Re: seeds taylanbayirli@xxxxxx (14 Oct 2015 09:17 UTC)
Re: seeds taylanbayirli@xxxxxx (14 Oct 2015 10:06 UTC)
Re: seeds Alex Shinn (16 Oct 2015 01:23 UTC)
Re: seeds taylanbayirli@xxxxxx (16 Oct 2015 13:34 UTC)
Re: seeds Alex Shinn (16 Oct 2015 23:48 UTC)
Re: seeds taylanbayirli@xxxxxx (17 Oct 2015 12:08 UTC)
Re: seeds Alex Shinn (17 Oct 2015 13:12 UTC)
Re: seeds taylanbayirli@xxxxxx (17 Oct 2015 14:09 UTC)
What SRFIs are for John Cowan (17 Oct 2015 14:41 UTC)
Re: What SRFIs are for taylanbayirli@xxxxxx (17 Oct 2015 15:56 UTC)
Re: What SRFIs are for John Cowan (17 Oct 2015 16:55 UTC)
Re: What SRFIs are for taylanbayirli@xxxxxx (17 Oct 2015 18:08 UTC)
Re: What SRFIs are for John Cowan (17 Oct 2015 18:51 UTC)
Re: seeds John Cowan (15 Oct 2015 17:49 UTC)
Re: seeds Alex Shinn (09 Oct 2015 02:54 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 07:59 UTC)
Re: seeds John Cowan (15 Oct 2015 17:51 UTC)
Re: seeds taylanbayirli@xxxxxx (15 Oct 2015 23:08 UTC)
Re: seeds John Cowan (16 Oct 2015 13:09 UTC)
Re: seeds taylanbayirli@xxxxxx (16 Oct 2015 14:01 UTC)

Re: seeds taylanbayirli@xxxxxx 14 Oct 2015 09:09 UTC

Alex Shinn <xxxxxx@gmail.com> writes:

> On Wed, Oct 14, 2015 at 3:28 AM, Taylan Ulrich Bayırlı/Kammer
> <xxxxxx@gmail.com> wrote:
>
>     If you mean the proposal of changing the signature of hash
>     functions to hash(object, bound, seed), I explained some issues
>     with that in previous mails on the list. (Please correct me if any
>     of my explanations was wrong.)
>
> The explanation as I understand is simply that existing
> implementations don't take seed arguments. This seems
> a non-issue to me since as I explained, the lazy implementor
> can simply provide a wrapper which ignores the seed. Once
> you add the seed argument internally, however, you get all
> the benefits of security and multi-hashing which you didn't
> have before.

It's not that existing implementations of hash functions don't take seed
arguments.  It's that existing implementations of hash tables don't pass
seed arguments to hash functions.

There is no lazy "surface-only" fix to that, AFAICT; it will require
changes to the core of an implementation's hash table implementation,
e.g. scm_hash_fn_get_handle in hashtab.c of libguile.

Compare that to my SRFI-126 implementation for Guile which is purely a
Scheme module.

>     > And the only complaint lodged against using a parameter object
>     > seems to be that referencing it costs one procedure call, which
>     > seems perfectly acceptable for an infrequent operation in a
>     > functional-leaning language.
>
>     I would currently summarize my main complaints as:
>
>     - Things will crash and burn when a hash table is operated on from
>     two dynamic extents with different values for the hash salt.
>
> John Cowan previously suggested a parameter, and I
> consider that idea broken for this very reason. The seed
> should be hidden from the user. However, your proposal
> of making every custom hash function author maintain
> their own seed handling logic is similarly a recipe for bugs.

If I'm not mistaken, most custom hash functions ultimately delegate the
true hashing calculation to one of the built-in hash functions.

For instance my application might have a record type to represent
certain objects in the real world (or at least outside of Scheme
semantics); one such real object could be represented in my program by
any number of instances of that record type; an instance only needs to
hold a unique identity value in an "id" field to be considered a Scheme
representation of the real object identified by that identity value.
(For instance, a <student> record type where the id field holds the
unique student ID.)

In that case I want equality on my records to be defined in terms of
their ID value and not their eq?/eqv? equality.  So I will define
(same-student? a b) as (equal? (student-id a) (student-id b)), and
(student-hash student bound) as (equal-hash (student-id student) bound).

In my experience this is the most common use-case for custom equality
and hashing.  I've never encountered any other use-case.  Of course my
experience is pretty limited.  I would be interested in learning of
other use-cases.  It's just that so far I'm assuming that any use-case
which involves using *truly* custom hashing is going to be fairly
obscure and involved, such that expecting the programmer to take care of
salting without any API help is acceptable.

In fact, I don't really see much of a difference in first place, between
passing a salt to a custom hash function, and expecting it to derive one
from SRFI_126_HASH_SEED.  The latter is merely a bit of annoyance to get
an integer from a string.  A user so security-unconscious as to skip
that is likely also going to plainly ignore the salt value passed to the
custom hash function.

Wouldn't you agree that the small theoretical benefit to be gained in
those very rare use-cases is no justification to expect every Scheme
implementation to make a change to the core of their implementation?

>     - Since the only real use-case is determinism for unit tests
>
> No, unit tests are a minor use case. Any tests involving
> hash tables should be written to not depend on the order
> of elements in the table. The only exception is testing the
> hash functions themselves, for which you can't do much
> better than:
>
> (test <expected> (my-hash obj seed bound))
>
> for several values. Without an explicit seed you need
> some way to specify the global value. An env variable
> is a poor way to handle this, since it places additional
> restrictions on the test running framework (e.g. there
> is currently no way in Snow to specify env vars to be
> set when running tests).
>
> But the primary motivation for determinism is in scientific
> computing, where you want resumable and/or verifiable
> results. One important case is mapreduce, where when
> running a long computation on thousands of machines
> at once, hardware failure is the norm rather than the
> exception. This can be addressed by running every
> computation twice, and rerunning when the results differ.
> As in tests, you can take effort to sort outputs and
> explicitly remove dependence on hash order, but you
> may be using third-party libraries which you can't change,
> or may not be able to afford the extra computation.
> This is where deterministic hashing can be useful.

From your explanation it sounds like SRFI_126_HASH_SEED is sufficient
for the described use-case, or am I missing something?

> And all of this is a minor point compared to the fact
> that the current SRFI-126 proposal is ugly, makes
> more work for everybody, and makes custom hashes
> second class citizens. I have no interest in this API.

"Ugly" on its own is purely subjective.  Please clarify.

I've explained that SRFI-126 makes little extra work (especially
compared to your proposal) for Scheme implementations, and showcased
this on one implementation so far.  There is a big difference between
being able to support an SRFI purely as a library, and through changes
to the core of an implementation.

I'm looking forward to hearing an explanation of how it is a problem to
get a salt from SRFI_126_HASH_SEED instead of receiving one as an
argument, in the truly-custom-hashing use-cases you have in mind which I
don't know of at all yet.

We can even specify (get-hash-salt) if you want, then the difference
will be truly trivial; a matter of wrapping your hash function body in

    (let ((salt (get-hash-salt)))
      ...)

and voila.

Custom hash functions for double-hashing implementations are still
"second-class" then in the sense that a user needs to provide two
procedures, but *even* then, the difference is from

    (define (foo-hash obj bound salt)
      ...)

to

    (define (make-foo-hash salt)
      (lambda (obj bound)
        ...))

    (define foo-hash1 (make-foo-hash x))
    (define foo-hash2 (make-foo-hash y))

and the second is backwards compatible with R6RS, compatible with all
existing Scheme implementations, compatible even with SRFI-69, and
otherwise familiar to programmers.

Taylan