On Wed, Sep 30, 2015 at 6:37 PM, Taylan Ulrich Bayırlı/Kammer <xxxxxx@gmail.com> wrote:

Hmm, we have a problem of backwards compatibility there first of all.
If the salt argument is mandatory, we break R6RS compatibility.  If it's
optional, that makes the point mostly moot.

It's trivial to write a portable wrapper around either R6RS or
SRFI-69 which ignores the salt.  If you want any better compatibility
than that I recommend using R6RS as is.

There's also the risk that programmers will be lazy and pass the hash
function the same salt every time, when they call them themselves, thus
doing more harm than good.

The hash-table implementation passes the salt.  This
is written just once.

> 2. It's more consistent. Otherwise the default hash
> functions and third-party hash functions all have
> different ways to initialize the salt, making it impossible
> to provide system-defined extensions such as
> initializing a (single) global salt from env vars.

I don't see the problem with an env var that affects both the built-in
hash functions and which users can obey in theirs?

What is the env var?  SRFI_126_HASH_SEED?  How
are you going to get both implementations and authors of
portable hash functions (e.g. string-ni-hash) to respect
this?  Will it be specified in the SRFI?

> 4. It's more secure.
> a. It allows a separate salt per table.
> b. It allows _changing_ the salt per table if any
> given table has too many collisions.

A separate salt per table could be provided in the hash table
constructor, or hash function constructor, if really desired.

The functions are fixed: string-hash, string-ci-hash, etc.
We don't have a make-string-ci-hash function.  If we did,
would we also recommend anyone who writes their own
function to also provide a generator?

Re. b., are there any known implementations doing that?  It sounds like
an interesting idea, but I'd rather not complicate the API for a
use-case whose usefulness isn't demonstrated.

The API is simpler and more elegant, given the
alternative which is for everyone who wants to
write their own hash function to manage their own
seeds.

> 5. It allows automatically extending any hash function
> to become a set of hash functions by varying the salt.
> In particular, this can be used for double hashing,
> cuckoo hashing, and bloom filters.

This too should be covered by hash function constructors if I'm not
mistaken, which is a less intrusive change to the API and can be added
post-hoc if really desired.

The idea is to allow implementations of SRFI-12[56] to
transparently make use of multiple hash functions, thus
making double hashing and cuckoo hashing possible.
Since SRFI-12[56] hash table constructors only take a
single hash function argument, this would otherwise not
be possible.

Bloom filters admittedly would want for a separate API,
which could take a generator or list of functions.  Do we
want to recommend everyone writing hash functions to
provide generators as well?  Do we want to provide
system-defined generators for eq?/eqv? hashes?  Oh
wait, we can't do that...

This proposal is all around more elegant, more secure,
and more powerful than any I've seen.  If I only cared
about backwards compatibility I'd stick with SRFI-69,
which is what all the chicken and chibi code I care about
uses.

Re: the bound argument, it's a promise that we don't
need values larger than the bound, allowing the hash
function to use fixnum-only arithmetic in most cases.  It
applies not just in special cases like disk-based tables, but
when the hash table consumes a large enough fraction
of memory to want a bignum number of buckets.  As
discussed on the SRFI-125 list, this can happen on
32-bit architectures with 3 tag bits when the table takes
around 1/4 of available memory.  It's realistic to want
a small server where the table takes up nearly 100%
of memory.

-- 
Alex