Re: Wording of the rationale Jens Axel Søgaard 14 Nov 2006 13:55 UTC

Aubrey Jaffer wrote:

 > | Date: Tue, 07 Nov 2006 13:55:54 +0100
 > | From: =?ISO-8859-1?Q?Jens_Axel_S=F8gaard?= <xxxxxx@soegaard.net>
 > |
 > | I forgot to add the following.
 > |
 > | It would be a good idea to reformulate the rationale in its
 > | own terms, so it can be understood without knowing anything
 > | about srfi-32.

 > [I am redirecting this to D. Horn because it is an editorial
 > question.]

 > Are you suggesting that SRFI-95 list the 28 procedures of SRFI-32?

No.

 > What might make more sense is to remove mention of SRFI-32 altogether;
 > SRFI-95 is derived from SLIB.

That was what I was attempting to suggest.

To me a rationale explains the reasoning behind the choices made.
The rationales for srfi-32 and srfi-95 aren't the same. I was
trying to argue that both rationales are valid.

That's why I think the wording of the srfi-95 rationale should
be changes. For example in:

 >  |      These 28 arose for several reasons:
 >  |
 >  |         * List and Vector versions were separated, even though
 >  |         Scheme has manifest types which eliminate ambiguity as to
 >  |         the caller's intentions.

The "even though" implies that the choices Shivers made were wrong.
But given the srfi-32 rationale, they make perfect sense. Of course
under the srfi-95 rationale, different choices make sense.

Your formulations what general purpose libraries are about:

 > General purpose software libraries are about simplicity and ease of
 > use, not theoretical perfection in algorithm design.
 >
 > A library should be specified so that its routines will perform
 > moderately well for moderately sized inputs in the vast majority of
 > applications.  Having many variants has disadvantages:
 >
 >  * When there are only a few paths through the code, the code gets
 >    thoroughly tested and its behavior well understood.  When there are
 >    many paths, most of the code is not well tested and not well
 >    understood.
 >
 >  * To optimally choose algorithms requires nearly as much
 >    understanding as to write them.  Most users don't.
 >
 >  * A module with too many functions and voluminous documentation
 >    scares off the typical user looking to just sort a 50-element list;
 >    who then goes searching for any old sort algorithm to reinvent the
 >    wheel.
 >
 >  * If some of the default sorts are unstable, then users will be
 >    surprised that sorting their data twice results in different
 >    orders; or that vector and list sorts return different orders.

explanins much better, why the choices were made, than the current
wording which compares the choices of srfi-95 to srfi-32.

The above is the main point of this posting.

Aubrey Jaffer wrote:
>  | Date: Tue, 07 Nov 2006 13:49:06 +0100
>  | From: =?ISO-8859-1?Q?Jens_Axel_S=F8gaard?= <xxxxxx@soegaard.net>
...
>  |      These 28 arose for several reasons:
>  |
>  |         * List and Vector versions were separated, even though
>  |         Scheme has manifest types which eliminate ambiguity as to
>  |         the caller's intentions.
>  |
>  | Separating list and vector functions are tradition.  In R5RS we
>  | have, for example list-ref, vector-ref and string-ref instead of
>  | general ref.  One benefit of this choice is that compilers get a
>  | fair chance to generate efficient code.
>
> LIST-REF and VECTOR-REF are primitive operations which compilers can
> inline.  It is conceivable that some really advanced compiler could
> inline merge, but merge is alread restricted to lists!
>
> Sort is a more complicated operation than merge.  The cost of the
> single type predicate conditional to detect array versus list is going
> to be significant only when sort is called with a trivially short
> argument.  It is reasonable to expect that library routines will be
> slower than inline code on these degenerate cases.

I am not disagreing. I merely saying, that there is more than one
valid view. The list-ref, vector-ref, ... way of constructing names
has, probably due to the lack of a module system, been in widespread
use. That's why I don't like the "even though".

As a minor point, in

    (let ([v (vector-sort <a-large-vector>)])
      <loop with (vector-ref v)>)

the compiler has a better chance to figure out at compile time
that v is bound to a vector, than if a genereal sort is used.

>  |         * Four varieties of algorithms were provided (quick, heap,
>  |         insert, merge) even though quick, heap, and insert sorts
>  |         have no significant advantage over merge-sort.
>  |
>  | First: The choice between these algorithms were in the "How
>  | to"-part of SRFI-32.  If a user doesn't care, which algorithm is
>  | used, he can use list-sort!, vector-sort!.
>  |
>  | Second: What does "no significant advantage" mean?
>
> It means that the asymptotics or functional properties are no better.

I like that formulation better.

The "no significant advantage" will be true most of the time, but
for some people for special programs, constant factors do matter.

>  | I were of the impression, that the "hidden constants" of O(n log
>  | (n)) of vector-quick-sort were smaller than that of
>  | vector-merge-sort.
>
> Is that from reading SRFI-32?

No. It was just an impression.

> SRFI-95 does not mandate merge-sort.  It mandates that the sort
> algorithm be stable and that asymptotic time for SORT be bounded by
> O(n log(n)).

Again, I was trying to argue, that to some people it will be an
advantage to be able to choose the specific algorithm. With your
rationale (general purpose sorting) it is quite reasonable to
chose a single sorting algorithm. With the srfi-32 rationale,
where completeness is part of the rationale, it makes sense
to include all standard sorting algorithms.

That's why I think this section should be reworded:

     * Four varieties of algorithms were provided (quick,
       heap, insert, merge) even though quick, heap, and
       insert sorts have no significant advantage over
       merge-sort.

Based on the srfi-95 rationale the wording implies that
the srfi-32 choices were wrong.

>  | Third: The different algorithms have different memory requirements.
>  | If very large vectors are sorted, it makes sense to choose a slower
>  | algorithm that allocates very little extra memory.  Some versions
>  | of vector-merge-sort! allocate a helper array the size of the
>  | original.
>
> http://en.wikipedia.org/wiki/Sorting_algorithm has a table of
> properties for sort algorithms.  "In-place merge sort" is shown as
> stable, O(n log(n)) time, and using no additional memory.
>
> SRFI-95's requirement that the KEY argument be called at most once per
> element necessitates allocation of O(n) storage for SORT when a KEY
> argument is supplied.  The current reference implementation always
> does; I will clone its routine to handle the no-KEY case efficiently.

Okay.

>  | Also: Is merge-sort the fastest algorithm, if the elements are
>  | "almost sorted"?
>  |
>  |        * Sort and stable-sort varieties were included, even though
>  |        an unstable sort has no performance advantage.
>  |
>  | As I see it, the reason they are in SRFI-32 is not efficiency but
>  | completeness.  Nevertheless, if vector-quick-sort! (which is
>  | unstable) is faster than vector-merge-sort! (which is stable) then
>
> "Faster" _is_ about efficiency, isn't it?

Yes - I was contradicting myself.

Completeness allows efficiency in situations were you know enough
about the data, to be able to choose the best between several
algorithms.

--
Jens Axel Søgaard