Interface view of dictionaries bear 25 Oct 2003 19:27 UTC


On Sat, 25 Oct 2003, Bradd W. Szonye wrote:

>Bradd wrote:
>>> Agreed. I keep waffling between two different concepts for the
>>> dictionary: is it a collection of mappings, or is it a bag with an
>>> index?
>
>xxxxxx@freenetproject.org wrote:
>> A dictionary is a datastructure which stores values that can be
>> accessed using a key.  Anything more specific than that assumes a
>> specific implementation type.

>There are two common kinds of dictionary ADTs:

>1. A collection of mappings. This is how you define a dictionary in set
>   theory. In this kind of dictionary, the keys are conceptually a part
>   of a collection element (but they need not be represented that way
>   internally). Scheme alists and C++ maps use this concept.

>2. A collection with an index. This is more like a sequence, or a bag
>   with an index. The keys are conceptually external to the elements
>   (but they need not be represented that way internally). PLT Scheme
>   hash tables use this concept.

I'm going to step back from the theory questions and fold questions,
and tell you what I find useful as a programmer.  I've implemented
most of this for hashtables, binary trees, and alists already, and
filling it out with the performance predicates would be the logical
next step.  I don't think of a "collection of mappings" versus a
"collection with an index" - I think of Ordered and Unordered
dictionaries.

I don't know whether this is really good design; but I think it's
fairly clear, and in practice my hashtable and btree libraries are
working quite nicely and easy to work with. To be honest, "fold" isn't
a very meaningful word to me, and I wouldn't use it in naming
functions.  I think in terms of iterators, ordering functions, and
reversed iterators, not in terms of folding operations.  likewise
"reversible" is a word that can mean any of dozens of things; what I'm
asking is whether an operation is sublinear time or constant time and
I'd use those keywords in naming performance predicates.

An iterator is part of the basic dictionary functionality; it's a
function that, when given a key, returns some "next" key that's valid
for that dictionary.  For unordered dictionaries, the sequence is
arbitrary.  For ordered dictionaries, it will depend on the key
ordering predicate that was given the dictionary creation function.
Reversed iterators traverse the keys in the opposite order, and as
such are only found in ordered dictionaries.  The "fold" operation is
most easily understood in terms of iterators, but iterators are not as
easily understood in terms of fold.  (or maybe that's just me.... I
already said it's not a very meaningful word to me.  When I see "fold"
in a program I'm probably going to be thinking about compressing hash
tables or something.)

My preferred style is that there are "interfaces" -- sets of functions
which are valid on a given structure.  So my take on it would be to
have all dictionaries provide the "dictionary" interface -- which is
this:

    (Forgive me for not relating these function names to the proposal;
    there aren't even near analogues to my libs in some cases.)

(create-*)
(*-insert! dict datum key),

     ;; fetch is a key lookup.
(*-fetch dict key),
(*-fetch-deleting! dict key)

     ;; get just gets one or more elements without caring which.
(*-get-one dict)
(*-get-one-deleting! dict)
(*-get-n dict num)
(*-get-n-deleting! dict num)

     ;; the following two are for functions of the signature
     ;; (callback dict keyval dataval)
(*-foreach dict callback)
     ;; a copy is made of the dict to protect it from side effects
(*-foreach! dict callback)
     ;; side effects happen to the dictionary.

(*-size dict) ;; returns number of datum/key pairs stored

(get-iterator dict)   ;; returns an iterator function that
                      ;; takes a key arg and returns some "next" key,
                      ;; or takes '() and returns some (arbitrarily
                      ;; chosen for unordered dictionaries) "first" key.

(*-nextkey dict) ;; same as function returned from get-iterator

;; the following is a test to see whether the "Ordered dictionary"
;; interface is also provided. The test for ordered dictionaries could
;; be part of the interface for all dictionaries.

(ordered-dict? dict)

     ;; the following eight items are performance predicates
(sublinear-insert? dict)
(constant-insert? dict)
     ;; get means the get-one and get-n operations, which don't care
     ;; about keys.
(sublinear-get? dict)
(constant-get? dict)
     ;; lookup means the fetch operations, which take a key.
(sublinear-lookup? dict)
(constant-lookup? dict)
     ;; these refer to the performance of the iterator function
(sublinear-next? dict)
(constant-next? dict)

Some dictionaries would also provide the "ordered dictionary"
interface -- which includes

(create-* <key ordering function>)
(*-fetchfirst dict)  ;; returns 1 entry
(*-fetchfirst dict number) ;; returns number entries
(*-fetchfirst-deleting! dict)
(*-fetchfirst-deleting! dict number)
(*-fetchlast dict)
(*-fetchlast dict number)
(*-fetchlast-deleting! dict)
(*-fetchlast-deleting! dict number)
(*-fetchrange dict key1 key2)
(*-fetchrange-deleting! dict key1 key2)
(*-fetch-next-n dict key1 number)
(*-fetch-next-n-deleting! dict key1 number)
(*-fetch-prev-n dict key1 number)
(*-fetch-prev-n-deleting! dict key1 number)
(*-firstkey dict)
(*-nextkey dict key) ;; same as returned from the get-iterator func above
(*-lastkey dict)     ;; same as returned from the get-reversed-iterator func
(get-reversed-iterator dict)
(keys-ordered? dict key1 key2)

  ;; the following items are performance predicates.
(sublinear-first? dict)
(constant-first? dict)
(sublinear-last? dict)
(constant-last? dict)
   ;; these refer to the performance of the reversed
   ;; iterator.
(sublinear-prev? dict)
(constant-prev? dict)

Associated with each interface is a predicate which accepts an
argument of any type and returns #t if it provides that interface.
So, for the sake of examples, we would have

(define htable (create-hashtable hfunc))
(define alyst (create-alist))
(define pqueue (create-priority-queue < ))
(define btree (create-binary-tree < ))
(define olist (create-ordered-alist < ))

(dictionary? htable) => #t
(dictionary? pqueue) => #t
(dictionary? btree) => #t
(dictionary? olist) => #t
(dictionary? alyst) => #t

(ordered-dict? htable) => #f
(ordered-dict? alyst) => #f
(ordered-dict? olist) => #t
(ordered-dict? btree) => #t
(ordered-dict? pqueue) => #t

Finally there's considerations of efficiency: Additional predicates
tell which general classes of operations are constant, sublinear, or
linear.

(sublinear-first? pqueue) => #t
(sublinear-first? olist) => #t
(sublinear-first? btree) => #t

(sublinear-next? btree) =>#t
(sublinear-next? pqueue) =>#f
(constant-next? btree) =>#f

(sublinear-prev? pqueue) =>#f
(sublinear-prev? btree) =>#t
(constant-prev? btree) =>#f

(constant-first? btree) => #f
(constant-first? olist) => #t
(constant-first? pqueue) => #f
  ;; pqueue has constant fetchfirst and get (which are actually the
  ;; same operation) but it cannot claim constant-first, because
  ;; its get-deleting! and first-deleting! operations take logarithmic
  ;; time.

(sublinear-last? pqueue) => #f
(sublinear-last? olist) => #f
(sublinear-last? btree) => #t

(sublinear-lookup? htable) => #t
(sublinear-lookup? btree) => #t
(sublinear-lookup? alyst) => #f
(sublinear-lookup? olist) => #f

(constant-lookup? htable) => #t
(constant-lookup? btree) => #f

(sublinear-insert? olist) => #f
(sublinear-insert? alyst) => #t
(sublinear-insert? btree) => #t
(sublinear-insert? htable) => #t

(constant-insert? alyst) => #t
(constant-insert? btree) => #f
(constant-insert? htable) => #t

;; the next two are implied because sublinear-first
;; or sublinear-last is true of them...

(sublinear-get? btree) => #t
(sublinear-get? htable) => #t

;; but that condition, while sufficient, is not necessary,
;; as alyst demonstrates.

(sublinear-get? alyst) => #t
(constant-get? alyst) => #t