On Wed, Sep 9, 2015 at 10:18 PM, John Cowan <xxxxxx@mercury.ccil.org> wrote:
Alex Shinn scripsit:

> Two dynamic perfect hash algorithms are Dietzfelbinger et al's
> algorithm [1] and the Cuckoo hash [2].  Both of these have
> additional restrictions on the load factor in order to get even
> _expected_ amortized constant time insertions.  This conflicts with
> implementation extensions to specify the load factor explicitly.

I would assume that a cuckoo-based framework (I haven't read D's paper)
would simply ignore such extensions.  But cuckoo hashing needs more
than one hash function in any case, so it doesn't fit the general
69/125/126/R6RS/CL hash framework.

1.  Since we give leeway to replace the hash function in special
cases, and since the function is never exposed, there's no reason
we can't replace it with 2 or more functions.

2. I realize now that the SRFI-114 hash functions are severely
underspecified for use in this SRFI - they need to produce values
suitable for any number of buckets, but the range is unspecified.
R6RS hash-tables are similarly underspecified, but SRFI-69 does
not have this problem because it uses an explicit `bound' argument.

If we don't add a bound argument, we should specify the range of
hash functions to be at least the largest imaginable number of
buckets in an application.  On 32-bit architectures this may in fact
exceed the max fixnum, though we could alleviate this by allowing
negative results as in Java.

In either case we have room for extra information which can be
used to derive multiple hash values from the initial result, making
cuckoo hashing a realistic option.

Another consideration is a "salt" argument to the hash functions
to be mixed in with the result, which is one way to avoid the
hash collision DoS attack.  This would also effectively allow an
arbitrary number of hash functions, which is useful for more
than just cuckoo hashing (double hashing).

-- 
Alex