gratuitous optimization and benchmarking [was Re: Request for Clarification on Rationale] Taylor R. Campbell 07 Apr 2006 05:32 UTC
Any compiler can optimize a lambda argument to CWV(*), but the key is that the lambda does not require a run-time, heap-allocated closure. Joo ChurlSoo's tests here are very skewed in this way, because the procedure argument to mu procedures must be heap-allocated in the general case, whereas the arguments to CWV can *always* be optimized (and stack-allocated, as any other continuations are) even in the general case. ((*) I find it very bizarre that there is a significant performance difference between CWV & RECEIVE in Gauche, but that's a topic for a different conversation.) Soo's benchmarks do not test anything beyond procedure call vs procedure return, the latter of which often has the overhead for multiple return values anyway in order to optimize the path for the single-value case in interpreters. All of the systems that he tested except Chez and Scheme48 use completely unoptimized VALUES & CWV implementations, furthermore. Now let's try a slightly different set of benchmarks. This new code elides the superfluous construction of a list, but builds many closures, which are almost always needed in code that uses multiple return values, so this benchmark more closely measures the performance of real code (cough cough cough cough). I also added another order of magnitude to the number of iterations in the T tests, because ten million went by too fast for accurate measurement. In addition, I ran the GC before starting the timer in all of the benchmarks. I'd have run each test ten or so times and taken the average time, but I got bored waiting for Scheme48 and Gauche. I got roughly consistent answers from T, anyway. All benchmarks were run on a 333 MHz UltraSPARC 5, although the Scheme systems were all compiled as 32-bit programs. No other users on the machine, &c. &c., unless someone snuck on while I wasn't looking. I forget how much RAM it has. T 3.1 (19) Scheme48 0.57 Gauche 0.8.6 m 36.91s 184.78s 210.86s v 10.33s 151.11s 83.33s mm 51.15s 258.25s 361.96s vv 12.8s 189.54s 105.47s Each of these tests calls the multiple-value-returning-entity with a continuation that computes the sum of the iteration number and the three values returned by the entity. The procedures m, v, mm, and vv are identical to the similarly named procedures in Soo's original code. I'd have tried more Scheme systems, but I don't have access to full Chez, I can't build RScheme on this machine or figure out how to use it on another, and unfortunately I can't find any other Scheme that uses a saner representation of multiple values than a tagged list or equivalent. (Eli Barzilay informs me that MzScheme implements multiple values better than a tagged list, but I wanted to sleep tonight, so I didn't try to build it and run the tests with it.) I have attached the source code, soo.scm and soo.t, to this message. This goes to show that anyone can make benchmarks that match the results they want. I, of course, claim that *my* benchmarks are more accurate to the truth than the other ones posted here, but YMMV. Although I, of course, maintain that a proper system for multiple return values is more efficient than MU & NU, the best conclusion to draw is probably the absence of one, and I recommend omitting any mention of performance from the rationale.