tl;dr: Let's remove the bounds argument, specify the
full range of fixnums (including negative), and also
include a seed argument.
SRFI 69 provided optional and ill-specified bounds
arguments to hash functions. R6RS removed these
entirely, with the argument that you could never create
a hash table which needed more than non-negative
This assumption is plainly wrong as it depends on the
actual range of fixnums. To be clear, any implementation
will have various limits, which is fine. What makes
hash functions special is that they form a contract
interacting with the hash table size limit, a contract
which is exposed to third parties who can write their
own hash functions. The concern then is that we could
inadvertently make it _impossible_ to write a hash table
implementation which supported certain hash tables
which would otherwise be possible, i.e. we could be
forcing an artificial upper bound.
Hash tables are closely tied to the core Scheme runtime
since we provide eq?-tables, so right away we can
assume some freedom in tagging size. Arguably 1 bit
tags are not practical in many cases, but a majority of
Schemes use 2 bits, so let's take that as a minimum,
despite the fact that some non-toy implementations use
3 bits (contrary to my previous claim that these 3-bit
implementations mattered). We then calculate, using
open-addressing and immediate objects for both keys
and values that we've consumed all memory, leaving
no room for the program, much less the runtime.
This is still not the end of the story. Hash functions
have a wide variety of uses in and of themselves,
outside of hash tables, and if we're specifying them
at all we should do it right. Pretty much all of these
uses are well addressed by my (separate) proposal
to provide seeds as arguments to the hash functions.
This lets us turn a hash function into a family of hash
functions. For simple cases like distributed and
disk-based hash tables which want for double or
quadruple the addressing space, it lets us combine
2 or 4 hashes to get a larger value.
There is one very simple other case, though, which
is the hash set. For this case we need store only
keys, so we exceed the non-neg fixnum range with
1/2 of memory left to spare. Chaining hash functions
for this is clumsy, but we can fix this again by removing
the restriction that hash functions return non-negative
values. Note this is consistent with Java hash functions
which also return negative values (though Java is
crippled by the fact that the values (and all indices)
are 32 bits, even on 64 bit systems).
Similar to hash sets we can consider optimistic
homogeneous typing, i.e. if all values fit in a u8 or
u16, we can store them in more compact uniform
vectors. Again, using non-negative fixnums solves
64-bit systems and beyond actually make this less
of a problem, since the number of bits used for tagging
remain the same, but the amount of space needed to
store the keys increases. Going in the other direction,
16-bit systems _would_ have a hard limit well below
they ran out of memory, but that's already a specialized
environment these days.
But I'm serious about the seed arguments.