On Wed, Sep 30, 2015 at 6:37 PM, Taylan Ulrich Bayırlı/Kammer <xxxxxx@gmail.com> wrote:

Hmm, we have a problem of backwards compatibility there first of all.
If the salt argument is mandatory, we break R6RS compatibility. If it's
optional, that makes the point mostly moot.

It's trivial to write a portable wrapper around either R6RS or

SRFI-69 which ignores the salt. If you want any better compatibility

than that I recommend using R6RS as is.

There's also the risk that programmers will be lazy and pass the hash
function the same salt every time, when they call them themselves, thus
doing more harm than good.

The hash-table implementation passes the salt. This

is written just once.

> 2. It's more consistent. Otherwise the default hash
> functions and third-party hash functions all have
> different ways to initialize the salt, making it impossible
> to provide system-defined extensions such as
> initializing a (single) global salt from env vars.

I don't see the problem with an env var that affects both the built-in
hash functions and which users can obey in theirs?

What is the env var? SRFI_126_HASH_SEED? How

are you going to get both implementations and authors of

portable hash functions (e.g. string-ni-hash) to respect

this? Will it be specified in the SRFI?

> 4. It's more secure.
> a. It allows a separate salt per table.
> b. It allows _changing_ the salt per table if any
> given table has too many collisions.

A separate salt per table could be provided in the hash table
constructor, or hash function constructor, if really desired.

The functions are fixed: string-hash, string-ci-hash, etc.

We don't have a make-string-ci-hash function. If we did,

would we also recommend anyone who writes their own

function to also provide a generator?

Re. b., are there any known implementations doing that? It sounds like
an interesting idea, but I'd rather not complicate the API for a
use-case whose usefulness isn't demonstrated.

The API is simpler and more elegant, given the

alternative which is for everyone who wants to

write their own hash function to manage their own

seeds.

> 5. It allows automatically extending any hash function
> to become a set of hash functions by varying the salt.
> In particular, this can be used for double hashing,
> cuckoo hashing, and bloom filters.

This too should be covered by hash function constructors if I'm not
mistaken, which is a less intrusive change to the API and can be added
post-hoc if really desired.

The idea is to allow implementations of SRFI-12[56] to

transparently make use of multiple hash functions, thus

making double hashing and cuckoo hashing possible.

Since SRFI-12[56] hash table constructors only take a

single hash function argument, this would otherwise not

be possible.

Bloom filters admittedly would want for a separate API,

which could take a generator or list of functions. Do we

want to recommend everyone writing hash functions to

provide generators as well? Do we want to provide

system-defined generators for eq?/eqv? hashes? Oh

wait, we can't do that...

This proposal is all around more elegant, more secure,

and more powerful than any I've seen. If I only cared

about backwards compatibility I'd stick with SRFI-69,

which is what all the chicken and chibi code I care about

uses.

Re: the bound argument, it's a promise that we don't

need values larger than the bound, allowing the hash

function to use fixnum-only arithmetic in most cases. It

applies not just in special cases like disk-based tables, but

when the hash table consumes a large enough fraction

of memory to want a bignum number of buckets. As

discussed on the SRFI-125 list, this can happen on

32-bit architectures with 3 tag bits when the table takes

around 1/4 of available memory. It's realistic to want

a small server where the table takes up nearly 100%

of memory.

Alex