Re: hash functions should not be required to accept two arguments
taylanbayirli@xxxxxx 10 Nov 2015 09:58 UTC
William D Clinger <xxxxxx@ccs.neu.edu> writes:
> SRFI 126 removes one prominent flaw in SRFI 69 (exposure of
> hash functions compatible with the eq? and eqv? procedures)
> but preserves an even more prominent flaw: the requirement
> that all hash functions accept a second optional argument.
>
> SRFI 126 also adds various features, some of them optional.
> In my opinion, those added features are considerably less
> important than the disadvantage imposed by requiring hash
> functions to accept a second argument.
>
> SRFI 126 states:
>
> A hash function is a procedure that maps keys to exact
> integer objects.
>
> That is what a hash function should be. That is not, however,
> what the hash functions specified by SRFI 126 actually are.
> Several paragraphs later, SRFI 126 tells the truth:
>
> A hash function is a procedure that must accept a key and a
> bound as an argument and should return a non-negative exact
> integer object.
>
> That second argument "signifies that the function need not
> return an integer greater than bound, although it may do so."
>
> In plain English, that means the second argument to a hash is
> worse than useless. That second argument has the effect of
> converting every hash function h into a family of one-argument
> hash functions indexed by the second argument of h. There is
> no reason for clients or implementors of SRFI 126 to think
> (h x 0) has anything to do with (h x 1), and so on.
>
> So far as I can see, this needless indexing of the hash function
> provides no benefits whatsoever. In particular, the specific
> reasons that some people have advanced for adding the second
> argument do not hold up once you realize the second argument
> can be (and often will be) ignored.
>
> Clients of SRFI 126 will be required to use and to write hash
> functions that accept that useless second argument, even though
> implementations of SRFI 126 may never pass a second argument to
> any hash function. Indeed, the higher quality implementations
> of SRFI 126 will be the ones least likely to pass a second
> argument to any hash function, just as Larceny's implementation
> of SRFI 69 never passes a second argument to any hash function.
>
> In the other direction, implementations of SRFI 126 that do
> pass a second argument to some hash functions must take care
> to pass exactly the same second argument in every call to those
> hash functions, unless they're ready and willing to rehash every
> association in the table.
>
> So requiring hash functions to accept a second argument increases
> the complexity for both implementations and clients without
> providing any identifiable benefit.
Thank you for your comments!
I'm now leaning towards removing the bound argument from the spec, which
is more backwards compatible with R6RS anyway as John also points out.
I agree with Alex on that simple hash table operations should not
allocate though. I would hope for an implementation of e.g. R6RS
hashtables to only work with fixnums in its implementations of
equal-hash, string[-ci]-hash and symbol-hash.
That should be sufficient for the purposes of most programs anyway.
Correct me if I'm wrong. Racket seems to do it that way even in its
native hash table API which isn't constrained by R6RS compatibility.
(Its equal-hash-code function returns a fixnum.)
Very big equal?, string? and symbol? hash tables could perhaps be
accommodated for via additional hash functions. I would leave this out
of the spec for now though.
> Some unrelated issues:
>
> On 10 September, Taylan asked:
>
> Do you think it would hinder adoption that the reader syntax is
> mandatory, or lead to other problems?
>
> Yes, the mandatory reader syntax makes SRFI 126 much less likely
> to be supported by implementations. We have seen that happen with
> several previous SRFIs (unrelated to hash tables).
Indeed, the reader syntax is optional on the meanwhile.
Taylan