Re: should hash function upper bound be exclusive?

*bear* 10 Aug 2005 00:13 UTC

On Tue, 9 Aug 2005, Per Bothner wrote:
> Current the 'bound' for hash functions is exclusive: the returned
> hash h is such that (and (>= 0 h) (< h bound)). One might want the
> default bound to be such that hashes range over the positive
> fixnums. But in that case the bound would not be a fixnum: it would
> have to be one larger than the largest fixnum.
On the other hand, the largest fixnum is likely to be either a power
of two, or one less than a power of two; as hashes are conventionally
implemented (via modulus operations) these are poor choices for a hash
range boundary. Numerical theory indicates that a much better choice
is the highest prime fixnum representable, or at least a fixnum with
no small factors. Otherwise you run into rehash problems where the
system may be able to find no valid rehash value even though the table
is not full.
> A hash-table implementation might want to cache hash values, and
> might also want to be able to resize the table.
Definitely; when you have the capability of resizing the table, you
must be able to rehash everything in the table. Usually you can make
that much faster by caching the hashes. Caching the hashes also makes
it much faster (in the average case) to check and make sure that you
are looking at an actual *match* rather than something else that
hashed into the same bucket.
bear