On Tue, Sep 15, 2015 at 8:31 PM, John Cowan <xxxxxx@mercury.ccil.org> wrote:
Alex Shinn scripsit:

> The only question is whether we should also include
> a bound, and if so how it should work:

You need to convince Will Clinger of this, since he convinced me that
the bound is nothing but extra work for an implementer of hash functions
and no help to the framework author.  He feels very strongly about it.

The important thing is to avoid bignum operations, otherwise
you need to cons just to retrieve from a hash table.

We can assume we have (rnrs arithmetic fixnum) as described
in the ConsentDocket, so we at least know what the max fixnum
is.  The question becomes when, if ever, is this insufficient?

The worst case then (in an implementation which cares about
performance) is 6 bits for tagging, plus 1 bit since we only return
positive hash values, or 7 wasted bits.  On a 32-bit platform we
need 4 bytes (2 bits) to store a pointer, so assuming we only need
to store keys (i.e. for a hash-set) then there are 5 bits of addressing
left.  In other words, if we have such a machine with the full 2^32
memory space, then once the table is using only 1/32 of available
memory the hash values need to be in the bignum range.

In the more common case we use 3 bits of tagging, but even
then once a hash table takes up one fourth the full memory we
need values in the bignum range.

These are extreme situations but important to consider.  Java
is hampered by using ints instead of longs for array addressing.

The other question is how general should the hash functions be?
We're only discussing internal hash tables, but if we wanted to
use the same hashing for offsets into a file then bignums will
more commonly be needed.

By providing an explicit bound we can optimize for the common
cases, only resorting to bignum computations when there's no
other choice.

-- 
Alex