Aubrey Jaffer wrote: > | Date: Tue, 07 Nov 2006 13:55:54 +0100 > | From: =?ISO-8859-1?Q?Jens_Axel_S=F8gaard?= <xxxxxx@soegaard.net> > | > | I forgot to add the following. > | > | It would be a good idea to reformulate the rationale in its > | own terms, so it can be understood without knowing anything > | about srfi-32. > [I am redirecting this to D. Horn because it is an editorial > question.] > Are you suggesting that SRFI-95 list the 28 procedures of SRFI-32? No. > What might make more sense is to remove mention of SRFI-32 altogether; > SRFI-95 is derived from SLIB. That was what I was attempting to suggest. To me a rationale explains the reasoning behind the choices made. The rationales for srfi-32 and srfi-95 aren't the same. I was trying to argue that both rationales are valid. That's why I think the wording of the srfi-95 rationale should be changes. For example in: > | These 28 arose for several reasons: > | > | * List and Vector versions were separated, even though > | Scheme has manifest types which eliminate ambiguity as to > | the caller's intentions. The "even though" implies that the choices Shivers made were wrong. But given the srfi-32 rationale, they make perfect sense. Of course under the srfi-95 rationale, different choices make sense. Your formulations what general purpose libraries are about: > General purpose software libraries are about simplicity and ease of > use, not theoretical perfection in algorithm design. > > A library should be specified so that its routines will perform > moderately well for moderately sized inputs in the vast majority of > applications. Having many variants has disadvantages: > > * When there are only a few paths through the code, the code gets > thoroughly tested and its behavior well understood. When there are > many paths, most of the code is not well tested and not well > understood. > > * To optimally choose algorithms requires nearly as much > understanding as to write them. Most users don't. > > * A module with too many functions and voluminous documentation > scares off the typical user looking to just sort a 50-element list; > who then goes searching for any old sort algorithm to reinvent the > wheel. > > * If some of the default sorts are unstable, then users will be > surprised that sorting their data twice results in different > orders; or that vector and list sorts return different orders. explanins much better, why the choices were made, than the current wording which compares the choices of srfi-95 to srfi-32. The above is the main point of this posting. Aubrey Jaffer wrote: > | Date: Tue, 07 Nov 2006 13:49:06 +0100 > | From: =?ISO-8859-1?Q?Jens_Axel_S=F8gaard?= <xxxxxx@soegaard.net> ... > | These 28 arose for several reasons: > | > | * List and Vector versions were separated, even though > | Scheme has manifest types which eliminate ambiguity as to > | the caller's intentions. > | > | Separating list and vector functions are tradition. In R5RS we > | have, for example list-ref, vector-ref and string-ref instead of > | general ref. One benefit of this choice is that compilers get a > | fair chance to generate efficient code. > > LIST-REF and VECTOR-REF are primitive operations which compilers can > inline. It is conceivable that some really advanced compiler could > inline merge, but merge is alread restricted to lists! > > Sort is a more complicated operation than merge. The cost of the > single type predicate conditional to detect array versus list is going > to be significant only when sort is called with a trivially short > argument. It is reasonable to expect that library routines will be > slower than inline code on these degenerate cases. I am not disagreing. I merely saying, that there is more than one valid view. The list-ref, vector-ref, ... way of constructing names has, probably due to the lack of a module system, been in widespread use. That's why I don't like the "even though". As a minor point, in (let ([v (vector-sort <a-large-vector>)]) <loop with (vector-ref v)>) the compiler has a better chance to figure out at compile time that v is bound to a vector, than if a genereal sort is used. > | * Four varieties of algorithms were provided (quick, heap, > | insert, merge) even though quick, heap, and insert sorts > | have no significant advantage over merge-sort. > | > | First: The choice between these algorithms were in the "How > | to"-part of SRFI-32. If a user doesn't care, which algorithm is > | used, he can use list-sort!, vector-sort!. > | > | Second: What does "no significant advantage" mean? > > It means that the asymptotics or functional properties are no better. I like that formulation better. The "no significant advantage" will be true most of the time, but for some people for special programs, constant factors do matter. > | I were of the impression, that the "hidden constants" of O(n log > | (n)) of vector-quick-sort were smaller than that of > | vector-merge-sort. > > Is that from reading SRFI-32? No. It was just an impression. > SRFI-95 does not mandate merge-sort. It mandates that the sort > algorithm be stable and that asymptotic time for SORT be bounded by > O(n log(n)). Again, I was trying to argue, that to some people it will be an advantage to be able to choose the specific algorithm. With your rationale (general purpose sorting) it is quite reasonable to chose a single sorting algorithm. With the srfi-32 rationale, where completeness is part of the rationale, it makes sense to include all standard sorting algorithms. That's why I think this section should be reworded: * Four varieties of algorithms were provided (quick, heap, insert, merge) even though quick, heap, and insert sorts have no significant advantage over merge-sort. Based on the srfi-95 rationale the wording implies that the srfi-32 choices were wrong. > | Third: The different algorithms have different memory requirements. > | If very large vectors are sorted, it makes sense to choose a slower > | algorithm that allocates very little extra memory. Some versions > | of vector-merge-sort! allocate a helper array the size of the > | original. > > http://en.wikipedia.org/wiki/Sorting_algorithm has a table of > properties for sort algorithms. "In-place merge sort" is shown as > stable, O(n log(n)) time, and using no additional memory. > > SRFI-95's requirement that the KEY argument be called at most once per > element necessitates allocation of O(n) storage for SORT when a KEY > argument is supplied. The current reference implementation always > does; I will clone its routine to handle the no-KEY case efficiently. Okay. > | Also: Is merge-sort the fastest algorithm, if the elements are > | "almost sorted"? > | > | * Sort and stable-sort varieties were included, even though > | an unstable sort has no performance advantage. > | > | As I see it, the reason they are in SRFI-32 is not efficiency but > | completeness. Nevertheless, if vector-quick-sort! (which is > | unstable) is faster than vector-merge-sort! (which is stable) then > > "Faster" _is_ about efficiency, isn't it? Yes - I was contradicting myself. Completeness allows efficiency in situations were you know enough about the data, to be able to choose the best between several algorithms. -- Jens Axel Søgaard