Re: Major issues - Simplelists

Re: Major issues Bradd W. Szonye 25 Oct 2003 20:47 UTC
Bradd wrote:
>> That's why I suggested that you use dictionaries as "mapping"
>> collections rather than "pure index" collections. For one thing, your
>> sole example of a dictionary -- the Scheme alist -- is a mapping, not
>> an index.

xxxxxx@freenetproject.org wrote:
> This is clear, thank you. I believe its obvious we wish to support
> mappings, as its the more general concept.  Now that thats cleared up,
> what issues remain in the SRFI which impede that goal?

Going through the procedure list:

COLLECTION-FOLD-LEFT/RIGHT

For dictionaries, enumerates the range-domain pairs. In other words, the
"single collection value" accepted by the fold-function should be a
(cons range-value domain-value).

COLLECTION-FOLD-KEYS-LEFT/RIGHT

All of the collections defined so far can support this, because they all
have some form of "indexing." This is true even for bags -- the
*-contains? procedure is effectively a self-index. The fold-function
must accept two arguments in addition to the seeds:

    (fold-function key element seed ...)

For all collections, the key is the value used to find an element in the
collection:

    Collection  Key
    set, bag    same as the element
    sequence    position number
    dictionary  range value

    Collection  Sample element      Fold-function called with
    set, bag    "hello"             (f "hello" "hello" seeds ...)
    sequence    "hello"             (f 0 "hello" seeds ...)
    dictionary  ("hello" "goodbye") (f "hello" ("hello" "goodbye") seeds ...)

*-COUNT, *-KEY-COUNT
*->LIST, *-KEYS->LIST

These should be semantically identical to collection-fold[-keys] with an
appropriate fold-function.

For example, *-count for dictionaries accepts a range-domain pair, and
*-key-count is defined for all collections. For bags and sets, the two
functions are effectively aliases for each other. For sequences,
*-key-count will always return 1 or 0 (i.e., does that position exist in
the sequence).

Likewise, *->LIST returns a list of range-domain pairs (an alist!) for
dictionaries. *-KEYS->LIST is actually not very useful for bags, sets,
and dictionaries (you already get everything you need with *->LIST), but
it's very useful for sequences. It may also be useful for future
collection types.

*-EQUIVALENCE-FUNCTION, *-KEY-EQUIVALENCE-FUNCTION
*-ORDERING-FUNCTION, *-KEY-ORDERING-FUNCTION

Again, it's possible to define the key relationship functions for all
collections so far. For bags and sets, they're identical to the element
relationship functions. For sequences, they are = and <.

For dictionaries, the element relationship functions should compare
range-domain pairs, and the key relationship functions should compare
range values only.

*-GET

For dictionaries, gets an arbitrary range-domain pair.

*=

Much easier to define now: Create the union of all collection elements
(annotated with their source collections) and partition it with elt=.
The collections are equivalent if each partition contains an equal
number of elements from each collection. Example:

    Let
    b be a bag of pairs that contains ((a 1) (b 2) (c 3))
    d be a dictionary that contains ((a 1) (b 2) (b 3) (c 3))
    elt= be (and (eqv? (car e1) (car e2)) (eqv? (car e1) (car e2)))

    The partition is
    ([b:(a 1) d:(a 1)] [b:(b 2) d:(b 2)] [d:(b 3)] [b:(c 3) d:(c 3)])

    The first, second, and fourth elements of the partition contain
    equal numbers of elements from b and d, but the third element does
    not. The two collections are not equal.

*-CONTAINS?, *-ADD

If these accept elements as arguments, they're also well-defined for
dictionaries. In fact, dictionaries would seem to support the entire bag
interface, so long as you remember that all "values" are really
range-domain pairs.

Likewise, bags can support the entire dictionary interface, so long as
you remember that the "keys" to a bag are identical to the elements of
the bag.

Therefore, I recommend making dictionary a subtype of bag and adding
*-contains-key? to the bag interface. In most cases, the two "contains?"
functions are equivalent to:

    (*-contains? cv elt) => (not (= 0 (*-count cv elt)))
    (*-contains-key? cv elt) => (not (= 0 (*-key-count cv elt)))

You can also add *-contains-key? to the set interface, or make set a
subtype of bag. Or make bag a subtype of set. Honestly, classification
of collections is *very* difficult. Collections are "value" types, and
those don't translate well to class-based OO systems. They work a bit
better with a prototype-based OO system, and even better in a generic
traits- or interface-based system.

*-INSERT-LEFT/RIGHT

These need a failure thunk (or a way to raise an exception). Insertion
may not be possible, not even at the ends, e.g., if the container is
inflexible or has an ordering requirement. Basically, *all* mutators
need failure thunks (or some other exception-throwing convention),
because generic mutators can always run into problems with violated
requirements.

*-REF pos, *-GET key

These are really the same function. Both of them turn a range value into
a domain value. Typically, collections use the *-REF convention for
sequential data and the *-GET convention for dictionary-like data.

OK solution: Make them aliases for the same basic function.

Better solution: Make *-get the generic range->domain function. For
dictionaries, it takes a range value and returns a domain value. For
sequences, it's the same as *-ref. For bags, it takes an element value
and returns that element. Define *-ref only for sequences and ordered
collections. An ordered dictionary can then use *-get to look up keys
and *-ref to look up the nth pair in the collection.

Consider adding *-get-any, which returns all elements that match the
key. Unlike *-get, which should return *domain values* for dictionaries
(because that's how it works in existing dictionary-like collections),
*-get-any should return *elements* (range-domain pairs) for
dictionaries.

*-DELETE, *-REMOVE

These are tricky. Depending on the collection type, you may need a way
to eliminate elements by position, by key, or by whole elements.

For bags and sets, "key" and "value" are the same thing. For
dictionaries, they're not. Usually, you'll want to delete mappings by
key (the range value), but occasionally you'll want to delete by whole
mappings (e.g., for doing dictionary differences with remove-all).

I'm not sure how to resolve this one yet. I think DELETE should
eliminate positions, and REMOVE should eliminate keys. Need a good name
for "eliminate based on the whole element value" for dictionary
differences.

*-PUT

This is probably OK as is, but you may want to consider generalizing it
to work for all collections, since they all have keys and values.

*-UPDATE

I definitely recommend extending this to all collections, or at least to
sequences.

I think that covers all of the dictionary/index issues (plus a few other
minor issues that occurred to me while I was writing it.)
--
Bradd W. Szonye
http://www.szonye.com/bradd