Email list hosting service & mailing list manager

Wording of the rationale Jens Axel Søgaard (07 Nov 2006 12:56 UTC)
Re: Wording of the rationale Jens Axel Søgaard (07 Nov 2006 21:24 UTC)

Re: Wording of the rationale Jens Axel Søgaard 07 Nov 2006 21:24 UTC

Jens Axel Søgaard skrev:

Quoting the rational of srfi-95:
>        * Four varieties of algorithms were provided (quick, heap,
>          insert, merge) even though quick, heap, and insert sorts have
>          no significant advantage over merge-sort.

Commenting:
> Second: What does "no significant advantage" mean? I were of the
> impression, that the "hidden constants" of O(n log (n)) of
> vector-quick-sort were smaller than that of vector-merge-sort.

...

> Also: Is merge-sort the fastest algorithm, if the elements are "almost
> sorted"?

I tried a few experiments, and discovered that heap-sort beats the other
algorithms when the elements are in almost reverse order. (The most
surprising about this, is that I didn't figured it out before running
the test).

That is: If you know something about the data to be sorted, you are
in a position, where you *want* to choose between the various
algorithms.

/Jens Axel Søgaard

   A sorted, long vector:  (vector 1 2 3 ... 1000000) with <
   ----------------------------------------------------------
   vector-merge-sort! and vector-insert-sort! roughly the same time
   vector-heap-sort! *much* slower

   vector-merge-sort!
   cpu time: 62 real time: 63 gc time: 0
   vector-insert-sort!
   cpu time: 62 real time: 63 gc time: 0
   vector-heap-sort!
   cpu time: 2343 real time: 2359 gc time: 0

   A reverse-sorted, long vector:  (vector 10000 ... 3 2 1) with <
   -------------------------------------------------------------
   vector-heap-sort! *very* fast
   vector-merge-sort! and vector-insert-sort! roughly the same time

   vector-merge-sort!
   cpu time: 2203 real time: 2219 gc time: 0
   vector-insert-sort!
   cpu time: 2047 real time: 2046 gc time: 0
   vector-heap-sort!
   cpu time: 16 real time: 16 gc time: 0

My test program and results follow:

(require (lib "32.ss" "srfi"))

; interval : integer integer -> (list integer)
;   (interval n m) => (list n n+1 ... m)
(define (interval n m)
   (do ((i m (- i 1))
        (xs '() (cons i xs)))
     ((< i n) xs)))

; copy the input vector, and time the sorting only
(define (test orig-v sorter)
   (let (; copy v so we can sort the same vector several times
         (v (list->vector
             (vector->list orig-v))))
     ; make sure the garbage of one test doesn't affect the next
     (collect-garbage)
     (collect-garbage)
     ; time the sorting only
     (time (sorter < v))))

'SORTED-VECTOR-TEST
(define n 1000000)
(define v (list->vector (interval 1 n)))
'vector-merge-sort!
(test v vector-merge-sort!)
'vector-insert-sort!
(test v vector-insert-sort!)
'vector-heap-sort!
(test v vector-heap-sort!)

'REVERSE-SORTED-VECTOR-TEST
(define n 10000)
(define v (list->vector (reverse! (interval 1 n))))
'vector-merge-sort!
(test v vector-merge-sort!)
'vector-insert-sort!
(test v vector-insert-sort!)
'vector-heap-sort!
(test v vector-heap-sort!)

'RANDOM-VECTOR-TEST
(define n 10000)
(define v (list->vector (map (lambda (n) (random 1000000))
                              (interval 1 n))))
'vector-merge-sort!
(test v vector-merge-sort!)
'vector-insert-sort!
(test v vector-insert-sort!)
'vector-heap-sort!
(test v vector-heap-sort!)

The results were (a different run than the above):

Welcome to DrScheme, version 359.100-svn6nov2006.
Language: Pretty Big (includes MrEd and Advanced Student).

SORTED-VECTOR-TEST
vector-merge-sort!
cpu time: 63 real time: 62 gc time: 0
vector-insert-sort!
cpu time: 62 real time: 62 gc time: 0
vector-heap-sort!
cpu time: 2359 real time: 2359 gc time: 0

REVERSE-SORTED-VECTOR-TEST
vector-merge-sort!
cpu time: 2110 real time: 2110 gc time: 0
vector-insert-sort!
cpu time: 2000 real time: 2000 gc time: 0
vector-heap-sort!
cpu time: 15 real time: 15 gc time: 0

RANDOM-VECTOR-TEST
vector-merge-sort!
cpu time: 16 real time: 16 gc time: 0
vector-insert-sort!
cpu time: 1031 real time: 1062 gc time: 0
vector-heap-sort!
cpu time: 31 real time: 31 gc time: 0