Per Bothner wrote:
> Do you have any examples or use cases or algorithms
that are simpler
> when using this "natural order"? What prior art is
there for
> using this order?
Built-in lexicographical orders on vectors
have hurt me more than once
for ordering actual 'vectors', i.e.
elements of vector spaces (abelian group
with a field acting on it) of different
dimension. By using the lex order you
effectively fix an embedding of the
lower-dimensional vector spaces into
the higher---and that is often the wrong
thing to do.
Concerning prior art: Are you really
asking me why anybody anywhere would
like to order sequences by length and
then lexicographically, instead just
lexicographically, and where you can
find published references to programs
that do that?
Actually, I do not intend to settle
this question by digging up 'prior art' and
looking where the bigger pile is. Let
us focus on which problem we would
like to solve, and how the solutions
fit with the structures that are present in
Scheme. The advantage of ordering LIST
and VECTOR the same is that
it is easier to remember. The advantage
of ordering them differently is that
you get the choice---based on the basic
operations.
> > If I understand correctly, your argument
is that LIST and VECTOR should
> > both be interpreted as mere implementations of some abstract
SEQUENCE
> > data type,
>
> I'm saying that some Scheme implementations (at least Kawa) and at
> least one SRFI (44) does this. I'm also pointing out that Common
Lisp
> does this. I.e. this is not a far-fetched concept.
SRFI 44 defines a lot and implements
nothing. I think SRFI 44 represents a good deal of
good work. Yet it is still to be seen
that this is really the way data structures should be
organized in Scheme. Comparing containers,
for example, is a problem: SRFI 44 defines
nothing for that but its design excludes
several possible approaches already.
Concerning Kawa and Common Lisp, the
problem is the same: The object system brings
structure to the world of data types---but
it also imposes a certain point of view.
This point of is certainly plausible,
but it is not the only one, and its benefits are overrated.
My point of view is that the user should
be enabled to construct exactly the data structures
required for an application from prefab
building blocks---instead of finding a fully organized
world already. Initially the organized
world is nice, but sooner or later you hit brick walls,
and the more complex it gets the more
that hurts.
Now, if you have sequence data that
is naturally ordered lexicographically than you can
use LISTs or VECTORs ordered as lists
to represent the data. If you have sequences
that are naturally ordered by length
and then lexicographically you can use VECTORs
or LISTs ordered as vectors. And in
both cases it is likely that the concrete default
representation has something to do with
the application already. (Which is the result of
the 'natural ordering based on the available
primitive operations' thing.)
> It is true that if an ordering is defined for sequence,
> then it would be bad to define inconsistent orderings for different
> sub-types of sequence. I'm making a weaker claim: if you define
an
> operation F on type A and on type B that are both sub-types of a common
> super-type S then good object-oriented design suggests you should
> define the operation F to have common properties (axioms) for both
> A and B.
And this what this SRFI is doing: Vector,
List, Number, String, Symbol, Char, etc. are all
subtypes of the overall Scheme object
type---and they all provide a compare procedure
that satisfies certain axioms. Furthermore
the orderings of number types are required
to be compatible with the tower of numeric
data types in R5RS, even though the tower
does not really define a subtype hierarchy
in the strong sense, because that makes
sense mathematically. Finally, default-compare
refines pair-compare. And that is as far
as the subtype thing goes in this SRFI.
In particular, LIST and VECTOR are *not* seen
as two concrete realizations of a common
container supertype.
Now your point of view is that Vector
and List should be interpreted as subtypes
of a Sequence type---because SRFI
44, Kawa, and Common Lisp do that way
and it is "not a far-fetched concept."
(Quite natural even, I would say.)
But I do not find this concept back
in the definition of Scheme as it is now,
and there are alternative approaches
that give you more choice.
Don't get me wrong, I am in fact a fan
of the Sequence approach. I even experimented
with my own Sequence data type as an
add-on for Scheme, but not as an abstract
type (or interface) to unify LIST and
VECTOR. The sequences I experimented with
were self-optimizing (amortized) data
structures that eventually adapt to their use
case (in particular LIFO, FIFO, or random-access)
and are reasonably efficient
(i.e. O(n log n) with small constants)
over an extremely large scale of size, including
external memory algorithms. In the end,
the implementation went out of hand for
a hobby project and I had to stop. Now
these sequences do not fit SRFI 44, nor
Kawa, nor Common LISP and they coexist
with LIST and VECTOR.
> Even if you view vectors and lists as both sequences,
it doesn't
> follow that on operation F must be defined in terms of sequences.
> However, it's usually a good idea.
>
> Moreover, some schemes provide "uniform vectors", and there
are SRFIs
> that define them. What is the natural order for a uniform vector?
> By your argument, and I think "common sense" (admittedly
unreliable)
> it should be the same order as for normal vectors. Certainly
if you
> believe uniform vectors "inherit" from an "abstract
vector" type.
>
> Now consider strings. Is a string a vector? If you have
uniform
> vectors, then it seems very strange to not view a string as a uniform
> vector. It follows that the ordering for strings should be the
same
> as for uniform vectors.
>
> Strings use lexicographic order.
> Hence uniform vectors should use lexicographic order.
> Hence vectors should use lexicographic order.
My whole point is that the order to
be defined for a data type does not follow
from the data type or its supertypes
but is part of what the data type means.
Now if the uniform vectors represent
elements of vector spaces than you might
want a different ordering than if they
represent strings.
Btw., strings are one the most exceptional
things around. The lex. ordering is
historic and not at all 'God-given'.
Japanese dictionaries contain several
alternative orderings: By stroke number,
by radical (prototypes of parts of the
actual characters, most are also characters
in themselves), or by pronounciation.
"The natural order" of strings
is a matter of taste. In Germany the treatment of
Umlauts has changed during the past
30 years: Today ä is inserted where
'ae' would be. But I have lots of dictionaries
at home that insert ä where
'a' would be. Using Unicode scalar values
ä has a high scalar value
so it would be inserted after 'z'. Ordering
strings by length>lex can also make
sense in some applications.
So I have difficulties following you
argumentation above.
My point of view is:
Yes, strings use lex. order because
that makes sense as a general default.
No, that does not imply that uniform
vectors should also use lex. order.
They can better provide one or more
orders that make sense for
their intended applications. And no,
that still does not imply that the built-in
VECTOR type should be ordered in any
particular way for the sake of
uniformity.
Historically, "uniform vectors"
do not allow you to specify the ordering you
want to use---but they should. Lukily,
in the majority of the cases it is sufficient
to define any ordering at all. If u8vector
or anything provides a default order,
I would make it length>lex because
that is the most natural and efficient for
this type, unless you tell me that you
always use them in hashing in which
case I would make the default order
by hashcode, then length>lex. ;-)
Sebastian.