neighbor deletion; param ordering; < & <=; fresh lists; cell recycling
shivers@xxxxxx
(21 Jul 2002 03:44 UTC)
|
Re: neighbor deletion; param ordering; < & <=; fresh lists; cell recycling
David Feuer
(21 Jul 2002 05:48 UTC)
|
cell recycling; reference implementation; param ordering; fresh lists shivers@xxxxxx (21 Jul 2002 17:17 UTC)
|
Re: cell recycling; reference implementation; param ordering; fresh lists
Francisco Solsona
(22 Jul 2002 15:14 UTC)
|
I have altered the description of LIST-DELETE-NEIGHBOR-DUPS! and LIST-MERGE! to include this text, which I think will clear up the cons-cell allocation issue: On the other hand, the procedures LIST-DELETE-NEIGHBOR-DUPS! LIST-MERGE! make only a single, iterative, linear-time pass over their argument lists, using SET-CDR!s to rearrange the cells of the lists into the final result -- they work "in place." Hence, any cons cell appearing in the result must have originally appeared in an input. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, LIST-MERGE! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion. I have changed the SRFI accordingly. The new draft is at http://www.cc.gatech.edu/~shivers/srfi/srfi-32.txt and will propagate out to the standard url http://srfi.schemers.org/srfi-32/srfi-32.html in the hands of esteemed editor Solsona in the near future. From: David Feuer <xxxxxx@techhouse.org> I'm still wondering where the reference implementation is. It is on my disk, not released. I need to clean up one of the routines before I release it, and it's just not as important to me right now as the design discussion, so that is what is getting my attention. [On the subject of parameter order between the < arg & the data arg:] Do you provide functions with the same _names_ as existing implementations, or just the same interfaces? Hmm, there is a small amount of clash, but less than I would have supposed. Check the other-implementations summary. But even if the *names* don't clash, the *convention* is completely inverted from current practice, which is asking for confusion. Let's hear from some other voices. Another little thought: You could provide _both_, with the different versions marked by asterisks or something. That could be a bit confusing though, and even lead to a bit of bloat. Ech. No, let's *settle* on a *standard* *convention*, and let implementations deal with backwards compatibility for existing code if we depart from their specific API. What I was thinking was that in functional-style code, if you sort an array, and if the array was in fact already sorted, you get a little win by using the old array rather than the new one (and letting the new one get trashed almost immediately in the first generation) rather than having 2 identical arrays sitting around wasting memory, and missing the opportunity for a first-generation collection. But I'm no expert at GC, so this may not actually be significant. Of course you can always check if the array is sorted first, but that would take more time. We're trying too hard here, being too clever. I was thinking of this in the "side-effecting world". I was guessing (without any serious thought on the topic) that it could be faster (at least on some implementations) to do a total-copy sort than a copy-then-sort. Don't have a clue if this is really the case. OK, here's the skinny. If you want to sort a list and guaranteed produce a fresh copy, here is how I would choose an underlying algorithm: - Best option, I think: Convert the list into a vector, sort that with an in-place vector-sorting algorithm, then convert it back into a fresh list. This wins for reasons of memory locality -- chasing pointers through memory is slower than doing address arithmetic. The conversion costs you linear time, so it doesn't affect your O(n lg n) run time, and you need a linear amount of temp storage, but you're committed to allocating a linear amount of memory, anyway. Need to be stable? OK, allocate *two* vectors and use my opportunistic vector merge sort: linear-time best case, O(n lg n) worst case, stable, good memory locality, and you only changed the constant factor on your temp memory requirements from a 1 to a 2. - Don't want to use vectors / don't have vectors? You can roll a version of my opportunistic destructive list merge sort algorithm so that it copies the input list on the fly, as it goes, and then uses destructive merge ops on the intermediate sorted lists. This would give you linear-time best case, O(n lg n) worst case, stable, and (here's the part you want) every cons cell allocated appears in the result -- no "wasted" consing of temporary lists. - Really want to be functional, no side effects? OK, use my opportunistic list merge sort. As above, but no SET-CDR!s at all... hence you end up allocating some intermediate lists that get thrown away. Now, wanting to sort a list and produce a guaranteed fresh copy is probably not the common case. You can do a fine job of this in three lines of code using the tools provided by this lib, if you choose plans A or C above. I don't think it is really probably worth tying up "API space" supporting it any more than that. OK, thank you for making me walk through that design analysis. By the way, I keep referring to "my opportunistic merge-sort algorithm." Vas ist das? It's an algorithm I invented about three years ago (which is what led to the production of this SRFI). It is a surpisingly simple algorithm with the properties I described above. It's been circulated privately, and I've submitted a manuscript describing the algorithm with proofs establishing the run-time bounds for publication, but it has not yet been published... so that's why you have almost certainly not heard of it. It is part of the reference implementation library, though. -Olin