3-way compares and short-circuit sorting Alex Shinn 05 Aug 2002 09:09 UTC

Perhaps it doesn't belong in this srfi, and it doesn't seem very
Scheme-ish, but I was wondering if anyone had given thought to C-style
comparisons that yield positive, negative or zero on comparison (as in
strcmp)?  This lets you write efficient short-circuit tests such as:

  int pred (obj a, obj b) {
    return ((cheap_func(a) - cheap_func(b))
             || (expensive_func(a) - expensive_func(b)))
            < 0;

where it is rare for the initial comparison to be zero.  As it is now,
you have to either sort twice or build a custom sort pred.  Building new
procedures is what Scheme is all about, so for the latter method I use
the following utility:

(define (chain-sort-predicate key1 pred<1 pred=1 key2 pred<2 . rest)
  (define (make-sort-predicate key pred)
    (if (eq? key identity)
      (lambda (a b) (pred (key a) (key b)))))
  (let ((secondary
         (if (null? rest)
           (make-sort-predicate key2 pred<2)
           (apply chain-sort-predicate (append (list key2 pred<2) rest)))))
    (lambda (a b)
      (let ((a1 (key1 a)) (b1 (key1 b)))
        (or (pred<1 a1 b1)
            (and (pred=1 a1 b1)
                 (secondary a b)))))))

which, without the sort-keys and variable-arity could just be

(define (chain-sort-predicate pred<1 pred=1 pred<2)
  (lambda (a b)
    (or (pred<1 a b)
        (and (pred=1 a b)
             (pred<2 a b)))))

which lets you do

   (sort data (chain-sort-predicate pred<1 pred=1 pred<2))

but the problem arises when pred<1 and pred=1 are not so cheap.  Usually
the steps needed to determine their value are the same, so it is a waste
to make the computation twice.

The solution in other languages is to define a general purpose
comparator, cmp in Python, <=> in Perl, which returns this trinary
value.  [This is also useful in general for OO-programming because you
can define just this method and you get <, >, =, et al for free on a
given class.]  Then you can sort with something-like

  (sort data (lambda (a b) (nzero-or (cmp1 a b) (cmp2 a b))))

and maybe define utilities to make that cleaner.

Anyway, just some rambling, thought I'd mention it...