should hash function upper bound be exclusive?
Per Bothner
(09 Aug 2005 17:31 UTC)
|
Re: should hash function upper bound be exclusive?
bear
(10 Aug 2005 00:14 UTC)
|
Re: should hash function upper bound be exclusive? Per Bothner (10 Aug 2005 04:52 UTC)
|
Re: should hash function upper bound be exclusive?
Panu Kalliokoski
(10 Aug 2005 06:36 UTC)
|
Re: should hash function upper bound be exclusive? Per Bothner 10 Aug 2005 04:52 UTC
bear wrote: > On the other hand, the largest fixnum is likely to be either a power > of two, or one less than a power of two; as hashes are conventionally > implemented (via modulus operations) these are poor choices for a hash > range boundary. Numerical theory indicates that a much better choice > is the highest prime fixnum representable, or at least a fixnum with > no small factors. I'm afraid I don't see your point. A "default bound" is a generic hash value that has a larger range than indexes for a hash table, which means you *will* be doing a modulo operation to convert the generic hash value to an index. Yes, some people recommend that that *the number of buckets* be a prime number, but I don't how well-founded that is. If the hash function is properly "random", does it matter? This faq: http://burtleburtle.net/bob/hash/hashfaq.html says: Anything with a modulo prime at the end is probably bad; a decent hash function could use a power of two and wouldn't need to rely on the modulo prime to further mix anything. > Otherwise you run into rehash problems where the > system may be able to find no valid rehash value even though the table > is not full. If by "rehash" you mean linear probing (secondary hashing, IIRC), then obviously the sequence of probes should be exhaustive, and one easy way to do that is to have an increment that is relatively prime to the number of buckets. Howwever, if each bucket contains a chain (as in the referernce implementation) that's not an issue. If by "rehash" you mean resizing (increasing the number of buckets), I still don't see the relevance. > Definitely; when you have the capability of resizing the table, you > must be able to rehash everything in the table. Usually you can make > that much faster by caching the hashes. Caching the hashes also makes > it much faster (in the average case) to check and make sure that you > are looking at an actual *match* rather than something else that > hashed into the same bucket. If we can resize (and don't want to recalculate hashes when resizing) then the cached hash needs to be table-size-independent. It should use an implementation-tuned range, and a reasonable choice matches the fixnum range (or the non-negative fixnum range). It also makes sense that this range match the default bound when the hash functions are called with only one parameter. -- --Per Bothner xxxxxx@bothner.com http://per.bothner.com/