Re: hash function bounds summarized
John Cowan 30 Nov 2015 04:26 UTC
Alex Shinn scripsit:
> This assumption is plainly wrong as it depends on the actual range
> of fixnums.
However, R6RS mandates at least 24-bit fixnums, and looking at FixnumInfo,
we see that all the widely used Schemes have at least 29-bit fixnums.
2^28-1 is a *lot* of buckets on a 32-bit machine. There are a few
Schemes that have the same size fixnums on both 32-bit and 64-bit builds
(Elk, IronScheme, Larceny, Mosh, RScheme), but I imagine it wouldn't
be horrendously hard to change that, except perrhaps for IronScheme
because of its CLR dependency.
> What makes hash functions special is that they form a contract
> interacting with the hash table size limit, a contract which is exposed
> to third parties who can write their own hash functions. The concern
> then is that we could inadvertently make it _impossible_ to write a
> hash table implementation which supported certain hash tables which
> would otherwise be possible, i.e. we could be forcing an artificial
> upper bound.
Well, yes. If a user-written hash function never delivers a result
larger than 2^20, then hash tables that would naturally require more
than 2^20 buckets aren't going to work properly. I don't see how
you can avoid this.
> but a majority of Schemes use 2 bits,
(Technically only a plurality: 14 out of 39 in 32-bit mode, and in 64-bit
only 8 out of 35.)
> But I'm serious about the seed arguments.
I am too, but I'm not convinced that an argument is the correct strategy.
I still incline to the idea of a global seed that is set randomly before
any user code executes. This disallows hyper-paranoid implementations
with a distinct seed per hash table, but makes management of hash
functions outside of hash tables much simpler.
--
John Cowan http://www.ccil.org/~cowan xxxxxx@ccil.org
Híggledy-pìggledy / XML programmers
Try to escape those / I-eighteen-N woes;
Incontrovertibly / What we need more of is
Unicode weenies and / François Yergeaus.