Email list hosting service & mailing list manager

hash function bounds summarized Alex Shinn (25 Nov 2015 23:16 UTC)
Re: hash function bounds summarized taylanbayirli@xxxxxx (26 Nov 2015 09:30 UTC)
Re: hash function bounds summarized John Cowan (27 Nov 2015 08:01 UTC)
Re: hash function bounds summarized Alex Shinn (30 Nov 2015 23:38 UTC)
Re: hash function bounds summarized John Cowan (30 Nov 2015 04:26 UTC)
Re: hash function bounds summarized Shiro Kawai (30 Nov 2015 04:52 UTC)
Re: hash function bounds summarized Alex Shinn (30 Nov 2015 23:45 UTC)
Re: hash function bounds summarized Alex Shinn (30 Nov 2015 23:23 UTC)

Re: hash function bounds summarized John Cowan 30 Nov 2015 04:26 UTC

Alex Shinn scripsit:

> This assumption is plainly wrong as it depends on the actual range
> of fixnums.

However, R6RS mandates at least 24-bit fixnums, and looking at FixnumInfo,
we see that all the widely used Schemes have at least 29-bit fixnums.
2^28-1 is a *lot* of buckets on a 32-bit machine.  There are a few
Schemes that have the same size fixnums on both 32-bit and 64-bit builds
(Elk, IronScheme, Larceny, Mosh, RScheme), but I imagine it wouldn't
be horrendously hard to change that, except perrhaps for IronScheme
because of its CLR dependency.

> What makes hash functions special is that they form a contract
> interacting with the hash table size limit, a contract which is exposed
> to third parties who can write their own hash functions.  The concern
> then is that we could inadvertently make it _impossible_ to write a
> hash table implementation which supported certain hash tables which
> would otherwise be possible, i.e. we could be forcing an artificial
> upper bound.

Well, yes.  If a user-written hash function never delivers a result
larger than 2^20, then hash tables that would naturally require more
than 2^20 buckets aren't going to work properly.  I don't see how
you can avoid this.

> but a majority of Schemes use 2 bits,

(Technically only a plurality:  14 out of 39 in 32-bit mode, and in 64-bit
only 8 out of 35.)

> But I'm serious about the seed arguments.

I am too, but I'm not convinced that an argument is the correct strategy.
I still incline to the idea of a global seed that is set randomly before
any user code executes.  This disallows hyper-paranoid implementations
with a distinct seed per hash table, but makes management of hash
functions outside of hash tables much simpler.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
Híggledy-pìggledy / XML programmers
Try to escape those / I-eighteen-N woes;
Incontrovertibly / What we need more of is
Unicode weenies and / François Yergeaus.