seeds
Alex Shinn
(08 Oct 2015 05:03 UTC)
|
Re: seeds taylanbayirli@xxxxxx (08 Oct 2015 07:55 UTC)
|
Re: seeds
Arthur A. Gleckler
(08 Oct 2015 17:46 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(08 Oct 2015 19:38 UTC)
|
Re: seeds
Arthur A. Gleckler
(08 Oct 2015 21:47 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(09 Oct 2015 08:09 UTC)
|
Re: seeds
Kevin Wortman
(09 Oct 2015 17:18 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(09 Oct 2015 18:33 UTC)
|
Re: seeds
Kevin Wortman
(13 Oct 2015 17:36 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(13 Oct 2015 18:28 UTC)
|
Re: seeds
Alex Shinn
(14 Oct 2015 01:59 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(14 Oct 2015 09:17 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(14 Oct 2015 10:06 UTC)
|
Re: seeds
Alex Shinn
(16 Oct 2015 01:23 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(16 Oct 2015 13:34 UTC)
|
Re: seeds
Alex Shinn
(16 Oct 2015 23:48 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(17 Oct 2015 12:08 UTC)
|
Re: seeds
Alex Shinn
(17 Oct 2015 13:12 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(17 Oct 2015 14:09 UTC)
|
What SRFIs are for
John Cowan
(17 Oct 2015 14:41 UTC)
|
Re: What SRFIs are for
taylanbayirli@xxxxxx
(17 Oct 2015 15:56 UTC)
|
Re: What SRFIs are for
John Cowan
(17 Oct 2015 16:55 UTC)
|
Re: What SRFIs are for
taylanbayirli@xxxxxx
(17 Oct 2015 18:08 UTC)
|
Re: What SRFIs are for
John Cowan
(17 Oct 2015 18:51 UTC)
|
Re: seeds
John Cowan
(15 Oct 2015 17:49 UTC)
|
Re: seeds
Alex Shinn
(09 Oct 2015 02:54 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(09 Oct 2015 07:59 UTC)
|
Re: seeds
John Cowan
(15 Oct 2015 17:51 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(15 Oct 2015 23:08 UTC)
|
Re: seeds
John Cowan
(16 Oct 2015 13:09 UTC)
|
Re: seeds
taylanbayirli@xxxxxx
(16 Oct 2015 14:01 UTC)
|
Alex Shinn <xxxxxx@gmail.com> writes: > The specification states: > > Implementations preferring a hashing strategy involving > a pair of hash functions may automatically derive a pair > of hash functions from a given single hash function. > > This is not possible in any meaningful way: hash functions > without a seed argument are a black box. The most you > could do is apply some deterministic function to the result, > which would shift around the values, but all hash collisions > would remain the same. You might be able to alter > bucket collisions of objects with different hash values, but > this is useless from a security perspective and dubious > from a performance one. Most hash tables will be eq?, eqv?, equal?, string, or symbol hash tables. The hash functions for these are known, so the implementation can check whether one of those was passed. Those who really want to use custom hash functions should make sure to pass a pair of functions to make-hashtable. Indeed, I should make it clearer that an implementation *not* using a pair of hash functions should simply ignore one in the pair when given two, so a programmer can just always provide two, since that's the more general API. Racket's custom hash tables API also takes two hash functions; I don't know if they're always used. > The SRFI adds new features: > > 1. constructors can take 1 or 2 hash functions > 2. a new environment variable to control initial seeds > 3. make-*-hash functions > > All put additional burden on both the SRFI implementor, > and the latter two on anyone who wants to write a new > hash function (they must parse and use the env var, and > must provide hash function generators). 1. Implementations that don't need two hash functions will just ignore one. 2. That environment variable is the easiest way to make hash tables deterministic when running a test suite, and a 'getenv' call to get its value and derive a salt from it doesn't seem like a big burden to me. 3. Likely, providing make-*-hash should be easy. Instead of doing (define (string-hash string) (let ((salt <default_salt>)) <body>)) you do (define (make-string-hash salt) (lambda (string) <body>)) (define string-hash (make-string-hash <default_salt>)) or am I missing something? That should be a straightforward change to an implementation that already does salting. Even one which doesn't can make make-string-hash a constant function, although we don't expect that from a serious implementation. > Users, on the other hand, are confused as to whether > provide 1 or 2 functions. In hopes of performance they > should provide 2, even though both may not be used, and > the idiom for hash table construction becomes: > > (make-hash-table > (cons (make-equal-hash (random-integer)) > (make-equal-hash (random-integer))) > equal?) > > This is far too verbose, so presumably we need to add > a utility, possibly: > > (make-hash-table (make-hash-pair make-equal-hash) equal?) > > and presumably another utility to get an integer from the > env var. > > This is a clear example of piling feature on top of feature, > in a way that creates more work for everybody. Users will only ever do: (make-hash-table equal-hash equal?) and an implementation that needs a pair of equal-hash functions will use its default pair of equal-hash functions in this case. > However, using a hash function signature of: > > (hash object salt bound) > > can remove the need for all of these features and is > still more general (e.g. it supports tables that want 3 > or more functions). > > The extra work goes away. Implementors that care > about security and/or want to experiment with multiple > hash functions can do so, and user code just works. > > Most R6RS programs will work unmodified, with the > exception of those that create custom hash functions, > or manually apply hash functions. The argument that > even those should continue to work is invalid because > adding the optional bound argument already breaks them. Indeed, the backwards compatibility issue isn't as bad I guess, since custom hash functions are already an obscure use-case. However, it puts a nontrivial burden on implementors, which is to restructure their hash table code to manage salting in the new way, by having a salt passed to hash functions everywhere. Imagine an implementation that has its hash table implementation split into two modules: hash functions, and hash tables, the former module salting implicitly. Now the implementor needs to restructure the code to have the salts communicated from the hash table module into the hash function module, instead of keeping it internal to the hash function module. That's a hypothetical issue; maybe there are no implementations out there that would have that issue. I'd still rather just take the safe path. The changes I made require only local, simple changes or additions to an R6RS hashtables implementation, or a straightforward SRFI-126 implementation based on an underlying SRFI-69 or native hash tables implementation: 1. Extend make-hashtable to also accept a pair or procedures, use the one in the car, ignore the cdr, and do what you were already doing. This is a very trivial change. 2. Where the salt was previously initialized to a random value, first check whether you should use SRFI_126_HASH_SEED as the seed, otherwise just do what you were already doing. I can imagine a small annoyance in having to derive an integer from a string (environment variable), though I guess adding the integer values of the characters in the string should be enough. (By the way, we probably want to specify that an empty string value doesn't count as the variable being set. There's error potential in that some people might not realize the difference between "export FOO=" and "unset FOO".) 3. Where previously your hash functions referenced a global salt value, or one defined statically in the function body, you now have to define a constructor that returns hash functions closing over a given salt value. I can see some annoyance in #3 when hash functions are defined in C. When closures can't easily be created from C, the easiest option might be to define hash functions that take the salt on every call, then in the Scheme part of the library, (define (make-foo-hash salt) (lambda (foo) (internal-foo-hash foo salt))) (Yes, that internal-foo-hash is like your proposal now, but the implementation needn't go through the whole hash table implementation to pass a salt to hash functions on every call.) Actually I wonder whether removing #3 might be a better idea anyway. As explained, implementations will derive their pair of standard foo-hash functions from being passed the canonical foo-hash function. That means we don't need make-foo-hash constructors for the standard hash functions unless we really want to support hash tables with different hash functions for the same type. That seems like an obscure use-case. Tell me if I shouldn't drop it from the next draft, otherwise I think I will. To clarify once more, I really want this SRFI to cover realistic use-cases as simply as possible, and keep its adoption to a maximum. It's Very Bad if a Scheme implementation requires intrusive changes to its native hash table API to support this SRFI. Taylan