Nice work! If accepted this might increase
portability a lot and
I am impatient to write (require "srfi-69")
for the first time.
A few remarks on the gory details:
* You specify that hash-table-keys and
hash-table-values do not
necessarily produce the elements in
the same order. That sounds
highly dangerous to me, and it is not
clear to me what the gain is.
In fact it would be reason for me to
forget about -keys and -values
and fall back on -fold (like you do
in the reference implementation)
to avoid introducing an error that will
be potentially very hard to find.
Are there any relevant hash table implementations
that are a lot faster
and enumerate keys and values in differing
order?
* In view of confusion with SRFI-67
(compare procedures), would
you consider renaming "comparison"
into something that refers more
to an equivalence relation than to a
general comparison? One could
simply call the parameter <i>equal</i>
with default <code>equal?</code>.
* I also second Neil's proposal to abandon
the aliases.
Of course this is a matter of personal
preference, but I can relate
my own development with aliases: In
Ruby, Matz introduced lots of
aliases in the built-in classes. At
first that seemed quite useful, but the more
I used it the less I liked it. In the
end, I always pick one (e.g. either get or ref)
and forget the other---when somebody
else uses the other, I need
to consult the documentation to make
sure it's really the same thing and
its not clear what is won by the aliases.
(Btw, I would go for ref, set!, and delete!)
* In the SRFIs I wrote in the past,
I tried hard to stay clear of identifiers that
are used in the major Scheme implementations.
This supports implementors
in integrating the new proposal into
their system without conflict, even
if the functionality could supersede
what they have already. This is
important because mature systems should
not change abruptly.
Of course, for your SRFI
this is really hard because most names
might be in use already---but anyway.
* You require "This operation should
have an (amortised) complexity
of O(1) with respect to the number of
associations in hash-table."
You might want to emphasize that this
is conditional to a good hash
function being used.
* Something more of an experimental
feature: My problem with hashing
in practice is that it is often surprisingly
difficult to come up with a good
hash function for a new data structure.
Would it be possible to provide
some help with this?
One way to do this is a set of procedures
and macros that construct new
hash functions from old ones: Whenever
several hash values are to be
combined, they are 'hashed together'
in a deterministic but distributing way.
* You might want to rethink your design
for the interaction between /hash/
and /bound/. The problem is this:
A standard way to combine hash functions
h_1, .., h_n is to evaluate a
modular polynomial, i.e. h(x) = Sum[h_i(x)
a^i : i in {1..n}] mod m, where
m is /bound/, and a is a number in {0..m-1}
such that 1, a, a^2, etc. runs
through all elements of {1..m-1}.
Now in your design the hash function
must work for all values of /bound/,
because /bound/ is chosen by the hash
table and passed to the hash
function. This is a problem, because
the modular polynomial method
only works for *some* values of m, namely
those for which a suitable a
exists. (The first failure is m = 8.)
If the modular polynomial method gets
used anyway, this is what happens: The
numbers m for which it works
are low in density (e.g. it works 14888
times for m in {1..10^5}), and if a
bad m is picked then the hash distribution
is certainly terrible because
the number of unused hash values is
at least 50%. (The mathematical
reason is that {1, a, a^2, ..} is a
a proper subgroup of (Z/m*Z)^x and by
that it is at most (m-1)/2 in size.)
As you see, the fact that /bound/ is
an argument to the hash function
makes it difficult to combine hash functions,
because hash functions
must work for all values of /bound/---which
is considerably more difficult
than working for some value of /bound/.
For this reason, libraries for hash
tables often use a different approach:
The hash function always returns a result
in {0..hash-range - 1}, where
hash-range is a predefined constant,
usually a prime close to but not
exceeding the maximal immediate integer.
It is then up to hash-ref etc.
to map a hash value h in {0..hash-range
- 1} into the index range of the
table at hand---which can then be done
by (h mod n), n = table-size.
If the implementation only allocates
tables of size a power of two, this
modular reduction can even be done by
masking out the higher bits
of h.
* Currently, there is no way to portably
make a copy of a hash table:
You cannot query /equal?/ and /hash/,
and there also no hash-table-copy
either (which by the way, might be good
to add.)
* hash-table-count sounds like counting,
but you specify it is O(1).
How about hash-table-size then?
----
Dr. Sebastian Egner
Senior Scientist Channel Coding & Modulation
Philips Research Laboratories
Prof. Holstlaan 4 (WDC 1-051, 1st floor, room 51)
5656 AA Eindhoven
The Netherlands
tel: +31 40 27-43166 *** SINCE 10-Feb-2005
***
fax: +31 40 27-44004
email: xxxxxx@philips.com
srfi-69xxxxxx@srfi.schemers.org
26-04-2005 00:10
To:
srfi-69@srfi.schemers.org
cc:
(bcc: Sebastian Egner/EHV/RESEARCH/PHILIPS)
Subject:
provided for compatibility
Classification:
Regarding the "provided for compatibility"
aliases...
I think aliases like this are more appropriate in a library than in a
SRFI.
Currently, the SRFI draft would require all complying Scheme
implementations to support multiple names for the same operation, which
seems a little odd ("call/cc" notwithstanding).
I'd lean towards having the SRFI specify only one name per operation,
and having the Scheme implementations and user code comply with that.