seeds Alex Shinn (08 Oct 2015 05:03 UTC)
Re: seeds taylanbayirli@xxxxxx (08 Oct 2015 07:55 UTC)
Re: seeds Arthur A. Gleckler (08 Oct 2015 17:46 UTC)
Re: seeds taylanbayirli@xxxxxx (08 Oct 2015 19:38 UTC)
Re: seeds Arthur A. Gleckler (08 Oct 2015 21:47 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 08:09 UTC)
Re: seeds Kevin Wortman (09 Oct 2015 17:18 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 18:33 UTC)
Re: seeds Kevin Wortman (13 Oct 2015 17:36 UTC)
Re: seeds taylanbayirli@xxxxxx (13 Oct 2015 18:28 UTC)
Re: seeds Alex Shinn (14 Oct 2015 01:59 UTC)
Re: seeds taylanbayirli@xxxxxx (14 Oct 2015 09:17 UTC)
Re: seeds taylanbayirli@xxxxxx (14 Oct 2015 10:06 UTC)
Re: seeds Alex Shinn (16 Oct 2015 01:23 UTC)
Re: seeds taylanbayirli@xxxxxx (16 Oct 2015 13:34 UTC)
Re: seeds Alex Shinn (16 Oct 2015 23:48 UTC)
Re: seeds taylanbayirli@xxxxxx (17 Oct 2015 12:08 UTC)
Re: seeds Alex Shinn (17 Oct 2015 13:12 UTC)
Re: seeds taylanbayirli@xxxxxx (17 Oct 2015 14:09 UTC)
What SRFIs are for John Cowan (17 Oct 2015 14:41 UTC)
Re: What SRFIs are for taylanbayirli@xxxxxx (17 Oct 2015 15:56 UTC)
Re: What SRFIs are for John Cowan (17 Oct 2015 16:55 UTC)
Re: What SRFIs are for taylanbayirli@xxxxxx (17 Oct 2015 18:08 UTC)
Re: What SRFIs are for John Cowan (17 Oct 2015 18:51 UTC)
Re: seeds John Cowan (15 Oct 2015 17:49 UTC)
Re: seeds Alex Shinn (09 Oct 2015 02:54 UTC)
Re: seeds taylanbayirli@xxxxxx (09 Oct 2015 07:59 UTC)
Re: seeds John Cowan (15 Oct 2015 17:51 UTC)
Re: seeds taylanbayirli@xxxxxx (15 Oct 2015 23:08 UTC)
Re: seeds John Cowan (16 Oct 2015 13:09 UTC)
Re: seeds taylanbayirli@xxxxxx (16 Oct 2015 14:01 UTC)

Re: seeds taylanbayirli@xxxxxx 08 Oct 2015 07:55 UTC

Alex Shinn <xxxxxx@gmail.com> writes:

> The specification states:
>
> Implementations preferring a hashing strategy involving
> a pair of hash functions may automatically derive a pair
> of hash functions from a given single hash function.
>
> This is not possible in any meaningful way: hash functions
> without a seed argument are a black box. The most you
> could do is apply some deterministic function to the result,
> which would shift around the values, but all hash collisions
> would remain the same. You might be able to alter
> bucket collisions of objects with different hash values, but
> this is useless from a security perspective and dubious
> from a performance one.

Most hash tables will be eq?, eqv?, equal?, string, or symbol hash
tables.  The hash functions for these are known, so the implementation
can check whether one of those was passed.

Those who really want to use custom hash functions should make sure to
pass a pair of functions to make-hashtable.  Indeed, I should make it
clearer that an implementation *not* using a pair of hash functions
should simply ignore one in the pair when given two, so a programmer can
just always provide two, since that's the more general API.  Racket's
custom hash tables API also takes two hash functions; I don't know if
they're always used.

> The SRFI adds new features:
>
> 1. constructors can take 1 or 2 hash functions
> 2. a new environment variable to control initial seeds
> 3. make-*-hash functions
>
> All put additional burden on both the SRFI implementor,
> and the latter two on anyone who wants to write a new
> hash function (they must parse and use the env var, and
> must provide hash function generators).

1. Implementations that don't need two hash functions will just ignore
one.

2. That environment variable is the easiest way to make hash tables
deterministic when running a test suite, and a 'getenv' call to get its
value and derive a salt from it doesn't seem like a big burden to me.

3. Likely, providing make-*-hash should be easy.  Instead of doing

    (define (string-hash string)
      (let ((salt <default_salt>))
        <body>))

you do

    (define (make-string-hash salt)
      (lambda (string)
        <body>))

    (define string-hash (make-string-hash <default_salt>))

or am I missing something?  That should be a straightforward change to
an implementation that already does salting.  Even one which doesn't can
make make-string-hash a constant function, although we don't expect that
from a serious implementation.

> Users, on the other hand, are confused as to whether
> provide 1 or 2 functions. In hopes of performance they
> should provide 2, even though both may not be used, and
> the idiom for hash table construction becomes:
>
> (make-hash-table
> (cons (make-equal-hash (random-integer))
> (make-equal-hash (random-integer)))
> equal?)
>
> This is far too verbose, so presumably we need to add
> a utility, possibly:
>
> (make-hash-table (make-hash-pair make-equal-hash) equal?)
>
> and presumably another utility to get an integer from the
> env var.
>
> This is a clear example of piling feature on top of feature,
> in a way that creates more work for everybody.

Users will only ever do:

    (make-hash-table equal-hash equal?)

and an implementation that needs a pair of equal-hash functions will use
its default pair of equal-hash functions in this case.

> However, using a hash function signature of:
>
> (hash object salt bound)
>
> can remove the need for all of these features and is
> still more general (e.g. it supports tables that want 3
> or more functions).
>
> The extra work goes away. Implementors that care
> about security and/or want to experiment with multiple
> hash functions can do so, and user code just works.
>
> Most R6RS programs will work unmodified, with the
> exception of those that create custom hash functions,
> or manually apply hash functions. The argument that
> even those should continue to work is invalid because
> adding the optional bound argument already breaks them.

Indeed, the backwards compatibility issue isn't as bad I guess, since
custom hash functions are already an obscure use-case.

However, it puts a nontrivial burden on implementors, which is to
restructure their hash table code to manage salting in the new way, by
having a salt passed to hash functions everywhere.

Imagine an implementation that has its hash table implementation split
into two modules: hash functions, and hash tables, the former module
salting implicitly.  Now the implementor needs to restructure the code
to have the salts communicated from the hash table module into the hash
function module, instead of keeping it internal to the hash function
module.

That's a hypothetical issue; maybe there are no implementations out
there that would have that issue.  I'd still rather just take the safe
path.  The changes I made require only local, simple changes or
additions to an R6RS hashtables implementation, or a straightforward
SRFI-126 implementation based on an underlying SRFI-69 or native hash
tables implementation:

1. Extend make-hashtable to also accept a pair or procedures, use the
one in the car, ignore the cdr, and do what you were already doing.
This is a very trivial change.

2. Where the salt was previously initialized to a random value, first
check whether you should use SRFI_126_HASH_SEED as the seed, otherwise
just do what you were already doing.  I can imagine a small annoyance in
having to derive an integer from a string (environment variable), though
I guess adding the integer values of the characters in the string should
be enough.

(By the way, we probably want to specify that an empty string value
doesn't count as the variable being set.  There's error potential in
that some people might not realize the difference between "export FOO="
and "unset FOO".)

3. Where previously your hash functions referenced a global salt value,
or one defined statically in the function body, you now have to define a
constructor that returns hash functions closing over a given salt value.

I can see some annoyance in #3 when hash functions are defined in C.
When closures can't easily be created from C, the easiest option might
be to define hash functions that take the salt on every call, then in
the Scheme part of the library,

    (define (make-foo-hash salt)
      (lambda (foo)
        (internal-foo-hash foo salt)))

(Yes, that internal-foo-hash is like your proposal now, but the
implementation needn't go through the whole hash table implementation to
pass a salt to hash functions on every call.)

Actually I wonder whether removing #3 might be a better idea anyway.  As
explained, implementations will derive their pair of standard foo-hash
functions from being passed the canonical foo-hash function.  That means
we don't need make-foo-hash constructors for the standard hash functions
unless we really want to support hash tables with different hash
functions for the same type.  That seems like an obscure use-case.  Tell
me if I shouldn't drop it from the next draft, otherwise I think I will.

To clarify once more, I really want this SRFI to cover realistic
use-cases as simply as possible, and keep its adoption to a maximum.
It's Very Bad if a Scheme implementation requires intrusive changes to
its native hash table API to support this SRFI.

Taylan