gratuitous optimization and benchmarking [was Re: Request for Clarification on Rationale] Taylor R. Campbell 07 Apr 2006 05:32 UTC
Any compiler can optimize a lambda argument to CWV(*), but the key is
that the lambda does not require a run-time, heap-allocated closure.
Joo ChurlSoo's tests here are very skewed in this way, because the
procedure argument to mu procedures must be heap-allocated in the
general case, whereas the arguments to CWV can *always* be optimized
(and stack-allocated, as any other continuations are) even in the
general case.

((*) I find it very bizarre that there is a significant performance
difference between CWV & RECEIVE in Gauche, but that's a topic for a
different conversation.)

Soo's benchmarks do not test anything beyond procedure call vs
procedure return, the latter of which often has the overhead for
multiple return values anyway in order to optimize the path for the
single-value case in interpreters.  All of the systems that he tested
except Chez and Scheme48 use completely unoptimized VALUES & CWV
implementations, furthermore.

Now let's try a slightly different set of benchmarks.  This new code
elides the superfluous construction of a list, but builds many
closures, which are almost always needed in code that uses multiple
return values, so this benchmark more closely measures the performance
of real code (cough cough cough cough).  I also added another order of
magnitude to the number of iterations in the T tests, because ten
million went by too fast for accurate measurement.  In addition, I ran
the GC before starting the timer in all of the benchmarks.  I'd have
run each test ten or so times and taken the average time, but I got
bored waiting for Scheme48 and Gauche.  I got roughly consistent
answers from T, anyway.

All benchmarks were run on a 333 MHz UltraSPARC 5, although the
Scheme systems were all compiled as 32-bit programs.  No other users
on the machine, &c. &c., unless someone snuck on while I wasn't
looking.  I forget how much RAM it has.

        T 3.1 (19)      Scheme48 0.57   Gauche 0.8.6
m        36.91s         184.78s         210.86s
v        10.33s         151.11s          83.33s
mm       51.15s         258.25s         361.96s
vv       12.8s          189.54s         105.47s

Each of these tests calls the multiple-value-returning-entity with a
continuation that computes the sum of the iteration number and the
three values returned by the entity.  The procedures m, v, mm, and vv
are identical to the similarly named procedures in Soo's original
code.

I'd have tried more Scheme systems, but I don't have access to full
Chez, I can't build RScheme on this machine or figure out how to use
it on another, and unfortunately I can't find any other Scheme that
uses a saner representation of multiple values than a tagged list or
equivalent.  (Eli Barzilay informs me that MzScheme implements
multiple values better than a tagged list, but I wanted to sleep
tonight, so I didn't try to build it and run the tests with it.)

I have attached the source code, soo.scm and soo.t, to this message.

This goes to show that anyone can make benchmarks that match the
results they want.  I, of course, claim that *my* benchmarks are more
accurate to the truth than the other ones posted here, but YMMV.
Although I, of course, maintain that a proper system for multiple
return values is more efficient than MU & NU, the best conclusion to
draw is probably the absence of one, and I recommend omitting any
mention of performance from the rationale.