Email list hosting service & mailing list manager

A proposed new approach to bounds vs. salt John Cowan (26 Dec 2015 02:12 UTC)
Re: A proposed new approach to bounds vs. salt taylanbayirli@xxxxxx (26 Dec 2015 14:00 UTC)
Re: A proposed new approach to bounds vs. salt Alex Shinn (30 Dec 2015 05:37 UTC)
Re: A proposed new approach to bounds vs. salt John Cowan (03 Jan 2016 22:26 UTC)
Re: A proposed new approach to bounds vs. salt Shiro Kawai (04 Jan 2016 02:08 UTC)
Re: A proposed new approach to bounds vs. salt Alex Shinn (07 Jan 2016 04:26 UTC)
Re: A proposed new approach to bounds vs. salt Kevin Wortman (15 Jan 2016 00:45 UTC)

A proposed new approach to bounds vs. salt John Cowan 26 Dec 2015 02:12 UTC

Well, I hope this doesn't get drowned in the flood of holiday-related
Scheme mail, but I've been thinking about this for a while and this is
my chance to get it written down.

This is about hash functions, which are important for both hash tables
and for independent use, which is why I'm cross-posting this to the lists
for both flavors of hash tables currently being discussed as well as to
the comparator list, since hash functions are an element of comparators.

To recap the existing situation with bounds: R6RS and SRFI 126 have no
notion of it, and SRFI 69 does not actually require that a hash function
accepts it, though it does require that the four standard hash functions
that a SRFI 69 implementation must provide must accept it and respect it.
The SRFI 69 *sample* implementation always provides it and errors out
if a hash function does not accept it and respect it, but this is not
true of all other SRFI 69 implementations.  Consequently, a reusable
hash function needs to accept a mandatory object argument and a second
optional argument.

I'm going to claim that bounds aren't really necessary for hash functions;
a hash function should simply strive to return a number in the range from
0 to the largest conceivable number of elements in a data structure,
which is what SRFI 69 says hash functions should do if they don't have
a bound argument.  Obviously, that's implementation-dependent, and
so I suggest that it be made available as the value of the function
(hash-default-bound).  This eliminates any need to pass it to an
individual hash function.

Salt is another matter: it's not provided for in any existing Scheme
standard, but many other languages provide it as a defense against
denial of service attacks.  However, those languages typically don't
expose their hash functions, nor do they permit tables with arbitrary
hash functions, so some innovation is necessary.  The hash function must
be aware of the salt when it starts working, so that the salt value gets
hashed by the function along with the object.

Now the simplest approach is simply to pass the salt as an argument to
the hash function, but it easily gets mixed up with passing the bounds.
I'm hopeful that as bounds go out, salt will come in, but I want to
avoid misinterpretation.  So I kicked around various ideas and came up
with this:

Bounds are always positive.  So let's make salt always negative, or
else not an exact integer at all.  When a hash function compliant to
SRFI 125/6/8 receives a positive exact integer as its second argument,
it should treat it as bounds, but is not obliged to do so unless the
programmer wants to make it compatible with the SRFI 69 reference
implementation.  But otherwise the second argument is interpreted as
salt.  If there is no second argument, then the bound is the value of
(hash-default-bound) and the salt is the value of (hash-default-salt),
which is also implementation-dependent but should be changed by the
implementation on each run.  For debugging purposes, the implementation
may also provide an implementation-dependent way to set the value
returned by (hash-default-salt), such as examining a particular
environment variable.

Comments?

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
Fundamental thinking is ha-ard.  Let's go ideology-shopping.
                        --Philosopher Barbie