Re: gratuitous optimization and benchmarking Joo ChurlSoo (07 Apr 2006 10:15 UTC)
|
Re: gratuitous optimization and benchmarking
Taylor R. Campbell
(07 Apr 2006 16:54 UTC)
|
* From: "Taylor R. Campbell" <xxxxxx@mumble.net> | Any compiler can optimize a lambda argument to CWV(*), but the key is | that the lambda does not require a run-time, heap-allocated closure. | Joo ChurlSoo's tests here are very skewed in this way, because the | procedure argument to mu procedures must be heap-allocated in the | general case, whereas the arguments to CWV can *always* be optimized | (and stack-allocated, as any other continuations are) even in the | general case. | ((*) I find it very bizarre that there is a significant performance | difference between CWV & RECEIVE in Gauche, but that's a topic for a | different conversation.) | Soo's benchmarks do not test anything beyond procedure call vs | procedure return, the latter of which often has the overhead for | multiple return values anyway in order to optimize the path for the | single-value case in interpreters. All of the systems that he tested | except Chez and Scheme48 use completely unoptimized VALUES & CWV | implementations, furthermore. | Now let's try a slightly different set of benchmarks. This new code | elides the superfluous construction of a list, but builds many | closures, which are almost always needed in code that uses multiple | return values, so this benchmark more closely measures the performance | of real code (cough cough cough cough). I also added another order of | magnitude to the number of iterations in the T tests, because ten | million went by too fast for accurate measurement. In addition, I ran | the GC before starting the timer in all of the benchmarks. I'd have | run each test ten or so times and taken the average time, but I got | bored waiting for Scheme48 and Gauche. I got roughly consistent | answers from T, anyway. | All benchmarks were run on a 333 MHz UltraSPARC 5, although the | Scheme systems were all compiled as 32-bit programs. No other users | on the machine, &c. &c., unless someone snuck on while I wasn't | looking. I forget how much RAM it has. T 3.1 (19) Scheme48 0.57 Gauche 0.8.6 | m 36.91s 184.78s 210.86s | v 10.33s 151.11s 83.33s | mm 51.15s 258.25s 361.96s | vv 12.8s 189.54s 105.47s | Each of these tests calls the multiple-value-returning-entity with a | continuation that computes the sum of the iteration number and the | three values returned by the entity. The procedures m, v, mm, and vv | are identical to the similarly named procedures in Soo's original | code. | I'd have tried more Scheme systems, but I don't have access to full | Chez, I can't build RScheme on this machine or figure out how to use | it on another, and unfortunately I can't find any other Scheme that | uses a saner representation of multiple values than a tagged list or | equivalent. (Eli Barzilay informs me that MzScheme implements | multiple values better than a tagged list, but I wanted to sleep | tonight, so I didn't try to build it and run the tests with it.) | I have attached the source code, soo.scm and soo.t, to this message. | This goes to show that anyone can make benchmarks that match the | results they want. I, of course, claim that *my* benchmarks are more | accurate to the truth than the other ones posted here, but YMMV. | Although I, of course, maintain that a proper system for multiple | return values is more efficient than MU & NU, the best conclusion to | draw is probably the absence of one, and I recommend omitting any | mention of performance from the rationale. On REPL, the same tests are performed after the following are loaded. I have neither full Chez nor Gauche. (Intel 586, Debian or Windows) (define-syntax mu (syntax-rules () ((mu argument ...) (lambda (f) (f argument ...))))) (define-syntax receive (syntax-rules () ((receive formals expression body ...) (call-with-values (lambda () expression) (lambda formals body ...))))) (define m (mu 1 2 3)) (define v (lambda () (values 1 2 3))) (define mm (lambda (x) (if (< x 0) (mu 1 2 3) (if (= x 0) (mu 4 5 6) (mu 7 8 9))))) (define vv (lambda (x) (if (< x 0) (values 1 2 3) (if (= x 0) (values 4 5 6) (values 7 8 9))))) (define-syntax dotimes (syntax-rules () ((dotimes (i count) body0 body1 ...) (let ((%COUNT count)) (do ((i 0 (+ i 1))) ((= i %COUNT)) body0 body1 ...))))) (define (test-m) (dotimes (i 10000000) (m (lambda (x y z) (+ i x y z))))) (define (test-v) (dotimes (i 10000000) (receive (x y z) (v) (+ i x y z)))) (define (test-mm) (dotimes (i 10000000) ((mm i) (lambda (x y z) (+ i x y z))))) (define (test-vv) (dotimes (i 10000000) (receive (x y z) (vv i) (+ i x y z)))) 1. Welcome to Scheme 48 1.3 (made by soo on Mon Jul 11 09:51:36 2005) Copyright (c) 1993-2005 by Richard Kelsey and Jonathan Rees. Please report bugs to scheme-48xxxxxx@s48.org. Get more information at http://www.s48.org/. Type ,? (comma question-mark) for help. > ,time (test-m) Run time: 37.96 seconds; Elapsed time: 40.22 seconds #t > ,time (test-v) Run time: 51.44 seconds; Elapsed time: 55.49 seconds #t > ,time (test-mm) Run time: 57.45 seconds; Elapsed time: 59.66 seconds #t > ,time (test-vv) Run time: 66.79 seconds; Elapsed time: 69.29 seconds #t 2. Welcome to MzScheme version 299.100, Copyright (c) 2004-2005 PLT Scheme, Inc. > (time (test-m)) cpu time: 30314 real time: 32607 gc time: 6495 > (time (test-v)) cpu time: 53206 real time: 58714 gc time: 20065 > (time (test-mm)) cpu time: 41039 real time: 43092 gc time: 6681 > (time (test-vv)) cpu time: 62870 real time: 65364 gc time: 20040 3. Version 1, Build 66 - linux-unix-gnu-x86 (c)2000-2004 Felix L. Winkelmann Using hygienic macros ; loading /usr/local/share/chicken/chicken-highlevel-macros.scm ... #;13> (time (test-m)) 127.11 seconds elapsed 71.13 seconds in (major) GC 1 mutations 50 minor GCs 4169 major GCs #;14> (time (test-v)) 163.51 seconds elapsed 92.9 seconds in (major) GC 1 mutations 55 minor GCs 5306 major GCs #;15> (time (test-mm)) 211.26 seconds elapsed 126.54 seconds in (major) GC 1 mutations 37 minor GCs 7387 major GCs #;16> (time (test-vv)) 225.75 seconds elapsed 131.63 seconds in (major) GC 1 mutations 23 minor GCs 7687 major GCs I wonder why my results are different from yours. -- Joo ChurlSoo