Dear list,
The issue of how to define the default
ordering for vectors has not
yet been settled. I would like to pick
it up again.
--- Issue ---
The question is what the default ordering
of vectors is supposed to be
for this SRFI. There are two candidates:
'lexicographic ordering' (LEX)
and 'first by length then lexicographically'
(LENGTH-LEX).
The choice affects VECTOR-COMPARE, DEFAULT-COMPARE,
and the
way orders are interpreted on an abstract
level. Moreover, there are
implications for future SRFIs (e.g.
the drafts 66 and 74) that define vector-
like data structures and could provide
a default ordering consistent
with this SRFI. A similar choice (LEX
or SIZE-LEX) is to be made for
the default ordering of comparison-based
set- and dictionary-like data
structures.
--- Summary of the discussion so far
---
The issue was brought up by Per Bothner
who wrote in
http://srfi.schemers.org/srfi-67/mail-archive/msg00031.html
> The default compare for vectors
is unusual, and more critically it is
> incompatible with the default compare
for lists and strings.
> The latter are both lexicographic
compare. I couldn't find a
> rationale for this difference.
>
> Note that this difference makes
it difficult to have a unified
> "sequence hierarchy"
as in Common Lisp. The problem is that
> ordering is defined in terms of
*implementation* types rather
> than semantics.
>
> I suggest using lexicographic compare
for all "sequence" types.
Mike Sperber added in
http://srfi.schemers.org/srfi-67/mail-archive/msg00031.html
that he found the distinction between
list-compare and
vector-compare confusing.
The discussion continued around whether
lists, strings, and vectors
should be interpreted as subtypes of
an abstract sequence type or
as essentially independent types. The
authors of this SRFI adopted
the latter point of view with the idea
that the basic operations of vector
are constant-time SIZE and REF suggesting
the LENGTH-LEX order
and LEX.
Moreover, uniform vectors (in the sense
of several SRFI) and strings
are cited to derive that vectors should
be ordered LEX. Per Bothner
wrote in http://srfi.schemers.org/srfi-67/mail-archive/msg00038.html
> Now consider strings. Is
a string a vector? If you have uniform
> vectors, then it seems very strange
to not view a string as a uniform
> vector. It follows that the
ordering for strings should be the same
> as for uniform vectors.
>
> Strings use lexicographic order.
> Hence uniform vectors should use
lexicographic order.
> Hence vectors should use lexicographic
order.
Jens Axel Søgaard replies with the orderings
of polynomials,
ordered first by degree and then lexicographically,
and points
to the potentially higher efficiency
of LENGTH-LEX over LEX
for types with constant-time SIZE operation.
In
http://srfi.schemers.org/srfi-67/mail-archive/msg00041.html
he writes
> Suppose now that we have the following
list:
>
> (list #10000(0 ... 0 1)
> #10001(0
... 0 2)
> #10002(0
... 0 3)
> #10003(0
... 0 4))
>
> In this scenerio the order used
in the srfi is better
> efficiencywise than the lexicographical
order -- and not
> only by a constant factor.
>
> It is also possible to find a list
where the srfi-ordering
> is slower than the lexicograpic,
but the slow down
> will be bounded by a small constant
factor. E.g.
>
> (list #10000(0 ... 0 1)
> #10000(0
... 0 2)
> #10000(0
... 0 3)
> #10000(0
... 0 4))
Donovan Kolbly expressed his concerns
about the cognitive load
of remembering that vectors and strings
are ordered differently in
case vectors are not ordered LEX because
he sometimes
converts strings into vectors of chars
and back.
Mike Sperber disputes the arguments
that the LENGTH-LEX ordering
of strings is more efficient and "natural"
by its relation to the basic
operations on vectors (i.e. constant
time SIZE and REF).
--- analysis ---
Looking at the discussion again, I see
only three major arguments:
(1) Conceptually vectors and lists are
just sequences, and these are
conventionally ordered LEX by default.
(2) LENGTH-LEX is more natural (and
efficient) for sequences that
support a constant-time SIZE operation.
(3) Conceptually strings are "vectors
of chars" and strings are conventionally
ordered LEX by default, so vectors should
be ordered LEX as well in order
to reduce confusion.
From my point of view, (2) is the most
fundamental, in a mathematical
sense. By providing different orders
for vectors and lists it is also
possible to have the two most important
liftings of total orders to
sequences readily available.
Argument (1) is important as well, primarily
because apparently many
people are not comfortable with the
existence of two default orderings
for sequences. This may indeed lead
to surprises in case the ordering
matters and the user does not care what
is specified in the SRFI. I do
not share the expectations, however,
that this will be a serious problem.
For me, argument (3) is weak. The conventional
ordering of strings is
mostly historic because it lends itself
well to cellulose-based
implementations of dictionary data structures
for sequences over
small-alphabets (e.g. western languages).
This is by no means
fundamental in the mathematical sense.
Japanese dictionaries,
for example, often order characters
first by the number of strokes
and then by other means; for strings
variant of LENGTH-LEX can
be found as well. So to summarize, I
am fully convinced that strings
should be ordered LEX by default, but
that does not imply vectors and
any other sequence container type should
also be ordered LEX.
Given my interpretation of the situation,
I would settle for the following:
* Lists are ordered LEX by default.
* Vectors are ordered LENGTH-LEX by
default.
* Strings are ordered LEX with case-sensitive
character comparison by default.
* The default ordering for lists and
vectors donate their name also as
prototype orderings (order "-as-list"
or "-as-vector") for other container types
that can be interpreted as sequences.
* Other vector-like data types should
define all orderings that are most
natural for them. These need not necessarily
be consistent with vectors,
lists, or strings. In case several default
orderings are natural, these can
be named differently by appending "-as-<<ordering>>".
In the cases of
SRFI 66 and 74 where the ordering is
of minor importance, I would
recommend to use LENGTH-LEX to be consistent
with this SRFI.
This happens to be the current state
of the SRFI specification.
Sebastian.