Re: Wording of the rationale Aubrey Jaffer 12 Nov 2006 23:40 UTC
| Date: Tue, 07 Nov 2006 13:49:06 +0100 | From: =?ISO-8859-1?Q?Jens_Axel_S=F8gaard?= <email@example.com> | | I am little unhappy about the wording of the rationale: | | RATIONALE | | When SRFI 32 Sort Libraries was withdrawn, it had 28 | procedures for what should have been a clean, simple | interface. | | In my view SRFI-32 *is* clean and simple. Comparing the number 28 | to the number of procedures defined in SRFI-95 is unfair, since | SRFI-95 covers the "What"-part but not the "How to"-part of | SRFI-32. | | These 28 arose for several reasons: | | * List and Vector versions were separated, even though | Scheme has manifest types which eliminate ambiguity as to | the caller's intentions. | | Separating list and vector functions are tradition. In R5RS we | have, for example list-ref, vector-ref and string-ref instead of | general ref. One benefit of this choice is that compilers get a | fair chance to generate efficient code. LIST-REF and VECTOR-REF are primitive operations which compilers can inline. It is conceivable that some really advanced compiler could inline merge, but merge is alread restricted to lists! Sort is a more complicated operation than merge. The cost of the single type predicate conditional to detect array versus list is going to be significant only when sort is called with a trivially short argument. It is reasonable to expect that library routines will be slower than inline code on these degenerate cases. | * Four varieties of algorithms were provided (quick, heap, | insert, merge) even though quick, heap, and insert sorts | have no significant advantage over merge-sort. | | First: The choice between these algorithms were in the "How | to"-part of SRFI-32. If a user doesn't care, which algorithm is | used, he can use list-sort!, vector-sort!. | | Second: What does "no significant advantage" mean? It means that the asymptotics or functional properties are no better. | I were of the impression, that the "hidden constants" of O(n log | (n)) of vector-quick-sort were smaller than that of | vector-merge-sort. Is that from reading SRFI-32? SRFI-95 does not mandate merge-sort. It mandates that the sort algorithm be stable and that asymptotic time for SORT be bounded by O(n log(n)). | Third: The different algorithms have different memory requirements. | If very large vectors are sorted, it makes sense to choose a slower | algorithm that allocates very little extra memory. Some versions | of vector-merge-sort! allocate a helper array the size of the | original. http://en.wikipedia.org/wiki/Sorting_algorithm has a table of properties for sort algorithms. "In-place merge sort" is shown as stable, O(n log(n)) time, and using no additional memory. SRFI-95's requirement that the KEY argument be called at most once per element necessitates allocation of O(n) storage for SORT when a KEY argument is supplied. The current reference implementation always does; I will clone its routine to handle the no-KEY case efficiently. | Also: Is merge-sort the fastest algorithm, if the elements are | "almost sorted"? | | * Sort and stable-sort varieties were included, even though | an unstable sort has no performance advantage. | | As I see it, the reason they are in SRFI-32 is not efficiency but | completeness. Nevertheless, if vector-quick-sort! (which is | unstable) is faster than vector-merge-sort! (which is stable) then "Faster" _is_ about efficiency, isn't it? | it makes sense to let vector-sort! default to vector-quick-sort! | and let vector-stable-sort! default to vector-merge-sort! in the | name of efficiency. General purpose software libraries are about simplicity and ease of use, not theoretical perfection in algorithm design. A library should be specified so that its routines will perform moderately well for moderately sized inputs in the vast majority of applications. Having many variants has disadvantages: * When there are only a few paths through the code, the code gets thoroughly tested and its behavior well understood. When there are many paths, most of the code is not well tested and not well understood. * To optimally choose algorithms requires nearly as much understanding as to write them. Most users don't. * A module with too many functions and voluminous documentation scares off the typical user looking to just sort a 50-element list; who then goes searching for any old sort algorithm to reinvent the wheel. * If some of the default sorts are unstable, then users will be surprised that sorting their data twice results in different orders; or that vector and list sorts return different orders.