string hash function in SRFI-13

*shivers@xxxxxx*16 Dec 2000 03:03 UTCBrad Lucier pointed out that my reference implementation of string-hash (and char-set-hash) doesn't match the comments above it. Here's an explanation of the issue. The basic idea for the string-hash function (this is not part of the spec; it's just the particular implementation chosen for the reference implementation) is taken from Java. We treat the string cn...c0 as a radix-37 numeral (albeit one where the individual digits can be larger then 36), and use that for the hash value: h = (c0 + c1*37 + c2*37^2 + ...) mod bound where BOUND is the desired range of the hash function. Now, the expression in parens is going to be a a really big number for a long string; we'd like to compute H without using a bignum package. We *could* do this: h := 0 for i = n to 0 do h := (c[i] + h*37) mod bound That's a lot of mod operations, however. Instead, why not define MASK to be a string of 1-bits long enough to "cover" BOUND, that is mask := 2^(ceiling(lg bound)) - 1 and then just keep these low bits in the intermediate computation, finishing up with a final mod op: h := 0 for i = n to 0 do h := (c[i] + h*37) and mask h := h mod bound Great -- that's much more efficient. But it ain't what we originally set out to compute -- it isn't computing the h value I originally wrote down. That is what Brad spotted (which I had missed). The question is: does it matter? Is this more-quickly-computed value any less good as a hash value? I don't think so. I have not thought it through carefully, however. Anybody else have anything else to say? -Olin