Re: Still on David's issues; SRFI 44

Show/hide message thread
Still on David's issues; SRFI 44 Panu Kalliokoski (14 Aug 2005 19:12 UTC)
Re: Still on David's issues; SRFI 44 bear (17 Aug 2005 01:46 UTC)
Re: Still on David's issues; SRFI 44 Panu Kalliokoski (17 Aug 2005 05:09 UTC)
Re: Still on David's issues; SRFI 44 bear (17 Aug 2005 08:40 UTC)
Re: Still on David's issues; SRFI 44 David Van Horn (17 Aug 2005 14:15 UTC)
Re: Still on David's issues; SRFI 44 David Van Horn 17 Aug 2005 14:00 UTC
Panu Kalliokoski wrote:
> Whether SRFI 69 should be about general mappings or hash tables, has
> been IMO adequately addressed; if there are suggestions about how to
> make it clearer in the abstract that it _is_ about hash tables, they are
> welcome.

In light of your comments, I agree with you Panu.  This is not just a mapping
datastructure.  If my comments seemed off-base it is only because I am
confused as to what the precise purpose of this SRFI is.  Without knowing the
precise purpose, I cannot make concrete suggestions on the document.  Your
response clarifies some, and I appreciate your patience and work.  My
suggestions are below.  I have no intentions of being insulting.

I have the following suggestions for the abstract and rationale.  If you
comment on these I can provide concrete suggestions for the rest of the
document.  I have tried to characterize the purpose of this SRFI as making it
possible to write portable code that uses hash tables in the most common ways
while remaining efficient.  Do you agree with that characterization?

I think the simple and generic aim should be dropped.  The design of this SRFI
should be simple and generic only so far as it furthers the above aim.

---8<---

Abstract

;; outline the need for, and design of, the proposal.

Hash tables are mutable data structures meeting certain complexity
requirements that provide a mapping from some set of keys to some set of
values associated to those keys.  When a good hash function is used, key
lookup and destructive update must be performed in amortised constant time.

Widely recognised as a fundamental data structure, most Scheme systems provide
their functionality.  However, no Scheme standard exists for hash tables and
the functionality and interfaces provided by implementations varies widely,
making it difficult to write portable programs that use hash tables.

This SRFI specifies an API for hash tables designed so that portable programs
can be written which make efficient use of common hash table functionality.

Rationale

;; explain why the proposal should be incorporated as a standard feature in
;; Scheme implementations. If there are other standards which this proposal
;; will replace or with which it will compete, the rationale should explain
;; why the present proposal is a substantial improvement.

Hash tables are widely recognised as a fundamental data structure for many
kinds of computational tasks.  Almost every non-minimal Scheme implementation
provides some kind of hash table functionality, although there is no existing
standard.

Alas, although somewhat similar, these hash table APIs have many differences:
some trivial, like the naming of certain functions; some complex, like
revealing different aspects of the internal implementation to the user; some
coarse, like requiring keys to be of some specific type(s); some subtle, like
requiring the user to guess the size of the hash table in advance to get
optimal performance.  As a result, it is difficult to write portable programs
that use hash tables.

The primary aim of this SRFI is to establish a standard API for hash tables so
that portable programs can be written which make efficient use of common hash
table functionality.  The API resolves the discrepancies between the various
names and semantics for hash table operations provided by Scheme systems by
standardizing the names and behaviors of the most common operations.
Incorporating this SRFI as a standard feature of Scheme implementations makes
it possible to write efficient and portable programs that use hash tables.

---8<---

Suggestions on the text given the aim of making it so that portable programs
can be written which make efficient use of common hash table functionality:

There are several things in the document that *may* lead to more efficient use
depending on the implementation.  I believe all of these should be dropped,
leaving specified only those things which *must* lead to efficient use in all
conforming implementations.  Specifically:

Drop size-hint.  Size-hint may be ignored as currently specified, thus its
status as an optional argument to hash table constructors does nothing to make
efficient use of hash tables portable.  Allow implementations to extend the
parameters arbitrarily to make implementation specific improvements.

Drop type specific hash tables.  If they are simply shorthands as stated, they
do nothing to make efficient use of hash tables portable.

If you'd like to suggest how particular implementations can improve
efficiency, you could add the following:  Implementations may improve
efficiency by specializing the hash table implementation when given the
equivalence procedure =, string=?...

I don't see any reason to complicate the API, in what I would say is an ad-hoc
manner, for non-portable gains in potential efficiency.

If you decide not to do this, you should add a rationale for including type
specific hash tables, and then a rationale for including this particular set
of types.  What is meant by "shorthand" is underspecified.  What are the
equivalence procedures used?  What happens when you add a string key to symbol
table?  There should be some way of determining if a hash-table value was
constructed with one of these constructors, eg symbol-hash-table? or
hash-table-type (as in Gauche).

I think you should state the following in the beginning of the specification
section.  This seems implied by the current text, but this is more explicit:

---8<---

An implementation that does not provide lookup and destructive update in
amortised constant time (when a good hash function is used) does not conform
to this SRFI.

An implementation that does not provide good hash function definitions for
hash, {string,string-ci,symbol}-hash, and hash-by-identity does not conform to
this SRFI.

---8<---

 From the document:

    If some key occurs multiple times in alist, it is unspecified which of the
    corresponding values will end up in hash-table.

Why not specify that they are added in the order they appear in the alist,
such as MzScheme does.  If this is not the case, portable programs can't rely
on this function unless it is never the case that a key appears twice in the
alist.  Is there something gained from this being unspecified that outweighs
the loss in portability?  There is no external syntax for hash tables so it
seems to read and write hash tables portably one needs a well specified
alist->hash-table and hash-table->alist.

> But because David seems to be very concerned about the neglection of
> SRFI 44, I probably have to tell why I'm not heeding its call.

I really have very little preference as to whether this SRFI follows the
conventions of SRFI 44, or if the proposed hash table datatype fits into the
subtype hierarchy of SRFI 44 collections.  I'm asking only for a rationale why
these choices were made.  Much of your email is an irrelevant critique of SRFI
44, but you do provide the basis of a rationale for your choices that should
be incorporated into the document.  I believe what I've quoted below is
relevant and can be summarized and included.

> What would SRFI 44 API of hash tables look like?
>
> Hash tables would have two supertypes: map and collection.  Collection
> operations would forget about the keys.  No indexed collection could be
> a subtype of a hash table, and no keyed collection could be a subtype of
> an indexed collection.
>
> There would be an asymmetry between items in a set and keys in a map:
> even though these two are very similar, the former are treated as
> "values" while the latter are treated as "keys".  Consequently,
> operations for these have different names.
>
> Hash tables could have a value equivalence function which would be never
> used, but the user could not supply a hash function.  This implies that
> appropriate hash functions could be guaranteed only up to some specific
> coarseness (as with SRFI 69 defaults), and beyond that, hash tables
> could not guarantee their complexity constraints.  In fact, every time
> the user supplied an unrecognised key equivalence function, the
> implementation would have to use a constant hash function (like (lambda
> (x) 0) for example) to guarantee correctness, destroying all the value
> of hash tables.
>
> Hash tables would lack an operation that folds with both keys and
> values, and hash-table->alist.  Generally, _all_ access to the actual
> associations of the hash table would be indirect.
>
> hash-table-update! would not be there, forcing people to look up the key
> twice when they update the value.
>
> Hash tables should be able to implement -delete-from! for a bag datatype
> that does not yet exist.
>
> Despite all this, I've tried to craft SRFI 69 so that it does not
> _prevent_ a SRFI 44 style API to hash tables.

David