This e-mail contains references to points that I find controversial
and/or would like to have more information on.
... bounds or no bounds to hash functions?
I don't get this argument. For the modular polynomial method to work,
it suffices for the mantissa and modulus to be relatively prime. This
can achieved easily by picking a prime modulus, but also by picking a
prime mantissa. I would have thought it to be more general if the hash
function gets to know the modulus on beforehand, not more restrictive.
That said, I don't have much reason for requiring the bound parameter.
The most important reasons for it were (1) the observation that the hash
seems to almost always be converted to some range anyway, so (hash obj
bound) seems cleaner than (modulo (hash obj) bound), and (2) the bonus
of getting arbitrarily big hash ranges on request (with a fixed range,
it is easy to reduce it but not extend it).
As any non-bounded hash function h1 can be made into bound-aware hash
function h2 by
(define (h2 obj bound) (modulo (h1 obj) bound))
one can't say that it makes it much more difficult to define bound-aware
hash functions than boundless ones. Am I missing something?
... naming: table or hash-table?
(1) "table" does not tell what the data type is supposed to be.
(2) (related) a proper name for an indeterminate mapping type is "mapping".
(3) the proper way to abstract different implementations of an interface
is not to define the implementations mutually exclusively but to define
each implementation independently and write glue to unify them under the
same abstraction.
(4) association lists and search trees are not implementations of the
SRFI-69 API, because of failing complexity constraints.
In other words, I want the users to know they are using hash tables, and
to be able to count on being using hash tables.
... keyword parameters for make-hash-table?
In principle, I agree: it would be nice to have an extensible interface
for make-hash-table. However, as currently no SRFI defines a syntax for
keyword parameters, I will refrain from using them.
... the set of provided hash functions?
Yes, hash functions are coupled with equality predicates. However, any
hash function for a coarser equality predicate is also acceptable for an
equality predicate. Now, in practice people don't use perfect hash
functions; that is, hash functions that are equally discriminative as
their corresponding equality predicate. This is because such a hash
function is really just an isomorphism between the key data type and the
subset of integers used, and in such a case, it is better to use an
ordinary vector than a full-blown hash table.
That is the basic explanation why it is probably not worthwhile to
provide generic hash functions of different coarseness: eq?-hash,
eqv?-hash and equal?-hash. Practically put, I can't think of any
realistic situation where the difference between distribution of
eq?-hash and equal?-hash would make any difference for performance.
However, for performance it _does_ make sense to provide type-specific
hash functions, to avoid the overhead of polymorphism.
Lastly, eq?-hash would be nice to provide, because R5RS currently does
not provide a way to get the "identity" of an object (other than the
object itself), but probably not the business of SRFI-69, but a separate
hashing SRFI.
Panu
--
personal contact: xxxxxx@iki.fi, +35841 5323835, +3589 85619369
work contact: xxxxxx@ling.helsinki.fi, +35850 3678003
kotisivu (henkkoht): http://www.iki.fi/atehwa/
homepage (technical): http://sange.fi/~atehwa/