SRFI 126 removes one prominent flaw in SRFI 69 (exposure of hash functions compatible with the eq? and eqv? procedures) but preserves an even more prominent flaw: the requirement that all hash functions accept a second optional argument. SRFI 126 also adds various features, some of them optional. In my opinion, those added features are considerably less important than the disadvantage imposed by requiring hash functions to accept a second argument. SRFI 126 states: A hash function is a procedure that maps keys to exact integer objects. That is what a hash function should be. That is not, however, what the hash functions specified by SRFI 126 actually are. Several paragraphs later, SRFI 126 tells the truth: A hash function is a procedure that must accept a key and a bound as an argument and should return a non-negative exact integer object. That second argument "signifies that the function need not return an integer greater than bound, although it may do so." In plain English, that means the second argument to a hash is worse than useless. That second argument has the effect of converting every hash function h into a family of one-argument hash functions indexed by the second argument of h. There is no reason for clients or implementors of SRFI 126 to think (h x 0) has anything to do with (h x 1), and so on. So far as I can see, this needless indexing of the hash function provides no benefits whatsoever. In particular, the specific reasons that some people have advanced for adding the second argument do not hold up once you realize the second argument can be (and often will be) ignored. Clients of SRFI 126 will be required to use and to write hash functions that accept that useless second argument, even though implementations of SRFI 126 may never pass a second argument to any hash function. Indeed, the higher quality implementations of SRFI 126 will be the ones least likely to pass a second argument to any hash function, just as Larceny's implementation of SRFI 69 never passes a second argument to any hash function. In the other direction, implementations of SRFI 126 that do pass a second argument to some hash functions must take care to pass exactly the same second argument in every call to those hash functions, unless they're ready and willing to rehash every association in the table. So requiring hash functions to accept a second argument increases the complexity for both implementations and clients without providing any identifiable benefit. **************************************************************** Some unrelated issues: On 10 September, Taylan asked: Do you think it would hinder adoption that the reader syntax is mandatory, or lead to other problems? Yes, the mandatory reader syntax makes SRFI 126 much less likely to be supported by implementations. We have seen that happen with several previous SRFIs (unrelated to hash tables). On 13 September, John Cowan wrote: In any case, nobody is going to randomize the order: it is in fact deterministic for any particular set of keys, it's just that the ordering you get is complicated enough that users treat it as unpredictable. The timing of garbage collections is not very deterministic, and garbage collections can change the internal sizes and orderings of hash tables. That is especially likely for hash tables that implement weak keys or similar complexities. On 30 September, Alex Shinn wrote: Re: the bound argument, it's a promise that we don't need values larger than the bound, allowing the hash function to use fixnum-only arithmetic in most cases. It applies not just in special cases like disk-based tables, but when the hash table consumes a large enough fraction of memory to want a bignum number of buckets. As discussed on the SRFI-125 list, this can happen on 32-bit architectures with 3 tag bits when the table takes around 1/4 of available memory. It's realistic to want a small server where the table takes up nearly 100% of memory. That reasoning fails for three or four separate reasons. In SRFI 126, the bound argument is not a promise about anything. It's nothing more than an extra argument the hash procedure may use (if it so chooses) to select a different hash function than it would have used had a different second argument been passed, or no second argument at all. Secondly, a 32-bit machine implies buckets occupy at least 8 bytes (to hold 32-bit representations of a key and a value). With 3-bit tags, the largest positive fixnum is likely to be 536870911. If you had that many 8-byte buckets on a 32-bit machine, you'd have a grand total of 4 bytes left over for other purposes. Thirdly, the number of distinct keys in a hash table is of the same order of magnitude as the number of buckets. Taking the storage occupied by those distinct keys into account reduces the number of buckets even further. There is no realistic scenario in which you'd need a number of buckets that exceeds the fixnum range that's available with 3-bit tags. Fourthly, there's no real problem with hash functions that use bignum arithmetic. Even in Larceny, whose bignum arithmetic is notoriously slow, it's hard to find any use of hash tables for which the use of bignum arithmetic in hash functions would be significant compared to the overhead of hash tables themselves. On 16 October, Alex Shinn wrote: eq?/eqv? hash tables with rehashing on gc, as in larceny, may be doing more work than is needed for some common eq?/eqv? cases. If the keys are only symbols, it would be reasonable to store a hash value directly in the symbol object, or even arrange for all symbols to be stationary in the gc. To allow such optimizations, I think we should add: make-symbol-hash-table Alex is assuming the implementation is remarkably stupid, but Larceny's eq?/eqv? hash tables are not that stupid. Larceny's symbols do contain their own hash values. Symbols, booleans, numbers, and various other things whose eq?/eqv? hash values aren't affected by a gc are not rehashed following a gc. Making an API more complicated just because it is possible to imagine an implementation that implements the API stupidly is not good design. The more complex the API, the more likely it is to be implemented stupidly. Will