|
neighbor deletion; param ordering; < & <=; fresh lists; cell recycling
shivers@xxxxxx
(21 Jul 2002 03:44 UTC)
|
|
Re: neighbor deletion; param ordering; < & <=; fresh lists; cell recycling
David Feuer
(21 Jul 2002 05:48 UTC)
|
|
cell recycling; reference implementation; param ordering; fresh lists shivers@xxxxxx (21 Jul 2002 17:17 UTC)
|
|
Re: cell recycling; reference implementation; param ordering; fresh lists
Francisco Solsona
(22 Jul 2002 15:14 UTC)
|
I have altered the description of LIST-DELETE-NEIGHBOR-DUPS! and LIST-MERGE!
to include this text, which I think will clear up the cons-cell allocation
issue:
On the other hand, the procedures
LIST-DELETE-NEIGHBOR-DUPS!
LIST-MERGE!
make only a single, iterative, linear-time pass over their argument lists,
using SET-CDR!s to rearrange the cells of the lists into the final result
-- they work "in place." Hence, any cons cell appearing in the result must
have originally appeared in an input. The intent of this
iterative-algorithm commitment is to allow the programmer to be sure that
if, for example, LIST-MERGE! is asked to merge two ten-million-element
lists, the operation will complete without performing some extremely
(possibly twenty-million) deep recursion.
I have changed the SRFI accordingly. The new draft is at
http://www.cc.gatech.edu/~shivers/srfi/srfi-32.txt
and will propagate out to the standard url
http://srfi.schemers.org/srfi-32/srfi-32.html
in the hands of esteemed editor Solsona in the near future.
From: David Feuer <xxxxxx@techhouse.org>
I'm still wondering where the reference implementation is.
It is on my disk, not released. I need to clean up one of the routines before
I release it, and it's just not as important to me right now as the design
discussion, so that is what is getting my attention.
[On the subject of parameter order between the < arg & the data arg:]
Do you provide functions with the same _names_ as existing
implementations, or just the same interfaces?
Hmm, there is a small amount of clash, but less than I would have supposed.
Check the other-implementations summary. But even if the *names* don't clash,
the *convention* is completely inverted from current practice, which is asking
for confusion.
Let's hear from some other voices.
Another little thought: You could provide _both_, with the different
versions marked by asterisks or something. That could be a bit confusing
though, and even lead to a bit of bloat.
Ech. No, let's *settle* on a *standard* *convention*, and let implementations
deal with backwards compatibility for existing code if we depart from their
specific API.
What I was thinking was that in functional-style code, if you sort an
array, and if the array was in fact already sorted, you get a little win
by using the old array rather than the new one (and letting the new one
get trashed almost immediately in the first generation) rather than having
2 identical arrays sitting around wasting memory, and missing the
opportunity for a first-generation collection. But I'm no expert at GC,
so this may not actually be significant. Of course you can always check
if the array is sorted first, but that would take more time.
We're trying too hard here, being too clever.
I was thinking of this in the "side-effecting world". I was guessing
(without any serious thought on the topic) that it could be faster (at
least on some implementations) to do a total-copy sort than a
copy-then-sort. Don't have a clue if this is really the case.
OK, here's the skinny. If you want to sort a list and guaranteed produce a
fresh copy, here is how I would choose an underlying algorithm:
- Best option, I think: Convert the list into a vector, sort that with an
in-place vector-sorting algorithm, then convert it back into a fresh
list. This wins for reasons of memory locality -- chasing pointers
through memory is slower than doing address arithmetic. The conversion
costs you linear time, so it doesn't affect your O(n lg n) run time, and
you need a linear amount of temp storage, but you're committed to
allocating a linear amount of memory, anyway. Need to be stable? OK,
allocate *two* vectors and use my opportunistic vector merge sort:
linear-time best case, O(n lg n) worst case, stable, good memory
locality, and you only changed the constant factor on your temp memory
requirements from a 1 to a 2.
- Don't want to use vectors / don't have vectors? You can roll a
version of my opportunistic destructive list merge sort algorithm
so that it copies the input list on the fly, as it goes, and then
uses destructive merge ops on the intermediate sorted lists. This
would give you linear-time best case, O(n lg n) worst case, stable,
and (here's the part you want) every cons cell allocated appears
in the result -- no "wasted" consing of temporary lists.
- Really want to be functional, no side effects? OK, use my opportunistic
list merge sort. As above, but no SET-CDR!s at all... hence you end
up allocating some intermediate lists that get thrown away.
Now, wanting to sort a list and produce a guaranteed fresh copy is probably
not the common case. You can do a fine job of this in three lines of code
using the tools provided by this lib, if you choose plans A or C above. I don't
think it is really probably worth tying up "API space" supporting it any more
than that.
OK, thank you for making me walk through that design analysis.
By the way, I keep referring to "my opportunistic merge-sort algorithm." Vas
ist das? It's an algorithm I invented about three years ago (which is what
led to the production of this SRFI). It is a surpisingly simple algorithm with
the properties I described above. It's been circulated privately, and I've
submitted a manuscript describing the algorithm with proofs establishing the
run-time bounds for publication, but it has not yet been published... so
that's why you have almost certainly not heard of it.
It is part of the reference implementation library, though.
-Olin