tl;dr: Let's remove the bounds argument, specify the

full range of fixnums (including negative), and also

include a seed argument.

SRFI 69 provided optional and ill-specified bounds

arguments to hash functions. R6RS removed these

entirely, with the argument that you could never create

a hash table which needed more than non-negative

fixnum buckets.

This assumption is plainly wrong as it depends on the

actual range of fixnums. To be clear, any implementation

will have various limits, which is fine. What makes

hash functions special is that they form a contract

interacting with the hash table size limit, a contract

which is exposed to third parties who can write their

own hash functions. The concern then is that we could

inadvertently make it _impossible_ to write a hash table

implementation which supported certain hash tables

which would otherwise be possible, i.e. we could be

forcing an artificial upper bound.

Hash tables are closely tied to the core Scheme runtime

since we provide eq?-tables, so right away we can

assume some freedom in tagging size. Arguably 1 bit

tags are not practical in many cases, but a majority of

Schemes use 2 bits, so let's take that as a minimum,

despite the fact that some non-toy implementations use

3 bits (contrary to my previous claim that these 3-bit

implementations mattered). We then calculate, using

open-addressing and immediate objects for both keys

and values that we've consumed all memory, leaving

no room for the program, much less the runtime.

This is still not the end of the story. Hash functions

have a wide variety of uses in and of themselves,

outside of hash tables, and if we're specifying them

at all we should do it right. Pretty much all of these

uses are well addressed by my (separate) proposal

to provide seeds as arguments to the hash functions.

This lets us turn a hash function into a family of hash

functions. For simple cases like distributed and

disk-based hash tables which want for double or

quadruple the addressing space, it lets us combine

2 or 4 hashes to get a larger value.

There is one very simple other case, though, which

is the hash set. For this case we need store only

keys, so we exceed the non-neg fixnum range with

1/2 of memory left to spare. Chaining hash functions

for this is clumsy, but we can fix this again by removing

the restriction that hash functions return non-negative

values. Note this is consistent with Java hash functions

which also return negative values (though Java is

crippled by the fact that the values (and all indices)

are 32 bits, even on 64 bit systems).

Similar to hash sets we can consider optimistic

homogeneous typing, i.e. if all values fit in a u8 or

u16, we can store them in more compact uniform

vectors. Again, using non-negative fixnums solves

this problem.

64-bit systems and beyond actually make this less

of a problem, since the number of bits used for tagging

remain the same, but the amount of space needed to

store the keys increases. Going in the other direction,

16-bit systems _would_ have a hard limit well below

they ran out of memory, but that's already a specialized

environment these days.

But I'm serious about the seed arguments.

--

Alex