|
Re: gratuitous optimization and benchmarking Joo ChurlSoo (07 Apr 2006 10:15 UTC)
|
|
Re: gratuitous optimization and benchmarking
Taylor R. Campbell
(07 Apr 2006 16:54 UTC)
|
* From: "Taylor R. Campbell" <xxxxxx@mumble.net>
| Any compiler can optimize a lambda argument to CWV(*), but the key is
| that the lambda does not require a run-time, heap-allocated closure.
| Joo ChurlSoo's tests here are very skewed in this way, because the
| procedure argument to mu procedures must be heap-allocated in the
| general case, whereas the arguments to CWV can *always* be optimized
| (and stack-allocated, as any other continuations are) even in the
| general case.
| ((*) I find it very bizarre that there is a significant performance
| difference between CWV & RECEIVE in Gauche, but that's a topic for a
| different conversation.)
| Soo's benchmarks do not test anything beyond procedure call vs
| procedure return, the latter of which often has the overhead for
| multiple return values anyway in order to optimize the path for the
| single-value case in interpreters. All of the systems that he tested
| except Chez and Scheme48 use completely unoptimized VALUES & CWV
| implementations, furthermore.
| Now let's try a slightly different set of benchmarks. This new code
| elides the superfluous construction of a list, but builds many
| closures, which are almost always needed in code that uses multiple
| return values, so this benchmark more closely measures the performance
| of real code (cough cough cough cough). I also added another order of
| magnitude to the number of iterations in the T tests, because ten
| million went by too fast for accurate measurement. In addition, I ran
| the GC before starting the timer in all of the benchmarks. I'd have
| run each test ten or so times and taken the average time, but I got
| bored waiting for Scheme48 and Gauche. I got roughly consistent
| answers from T, anyway.
| All benchmarks were run on a 333 MHz UltraSPARC 5, although the
| Scheme systems were all compiled as 32-bit programs. No other users
| on the machine, &c. &c., unless someone snuck on while I wasn't
| looking. I forget how much RAM it has.
T 3.1 (19) Scheme48 0.57 Gauche 0.8.6
| m 36.91s 184.78s 210.86s
| v 10.33s 151.11s 83.33s
| mm 51.15s 258.25s 361.96s
| vv 12.8s 189.54s 105.47s
| Each of these tests calls the multiple-value-returning-entity with a
| continuation that computes the sum of the iteration number and the
| three values returned by the entity. The procedures m, v, mm, and vv
| are identical to the similarly named procedures in Soo's original
| code.
| I'd have tried more Scheme systems, but I don't have access to full
| Chez, I can't build RScheme on this machine or figure out how to use
| it on another, and unfortunately I can't find any other Scheme that
| uses a saner representation of multiple values than a tagged list or
| equivalent. (Eli Barzilay informs me that MzScheme implements
| multiple values better than a tagged list, but I wanted to sleep
| tonight, so I didn't try to build it and run the tests with it.)
| I have attached the source code, soo.scm and soo.t, to this message.
| This goes to show that anyone can make benchmarks that match the
| results they want. I, of course, claim that *my* benchmarks are more
| accurate to the truth than the other ones posted here, but YMMV.
| Although I, of course, maintain that a proper system for multiple
| return values is more efficient than MU & NU, the best conclusion to
| draw is probably the absence of one, and I recommend omitting any
| mention of performance from the rationale.
On REPL, the same tests are performed after the following are loaded.
I have neither full Chez nor Gauche. (Intel 586, Debian or Windows)
(define-syntax mu
(syntax-rules ()
((mu argument ...) (lambda (f) (f argument ...)))))
(define-syntax receive
(syntax-rules ()
((receive formals expression body ...)
(call-with-values (lambda () expression)
(lambda formals body ...)))))
(define m (mu 1 2 3))
(define v (lambda () (values 1 2 3)))
(define mm (lambda (x)
(if (< x 0)
(mu 1 2 3)
(if (= x 0)
(mu 4 5 6)
(mu 7 8 9)))))
(define vv (lambda (x)
(if (< x 0)
(values 1 2 3)
(if (= x 0)
(values 4 5 6)
(values 7 8 9)))))
(define-syntax dotimes
(syntax-rules ()
((dotimes (i count) body0 body1 ...)
(let ((%COUNT count))
(do ((i 0 (+ i 1)))
((= i %COUNT))
body0
body1
...)))))
(define (test-m)
(dotimes (i 10000000)
(m (lambda (x y z)
(+ i x y z)))))
(define (test-v)
(dotimes (i 10000000)
(receive (x y z) (v)
(+ i x y z))))
(define (test-mm)
(dotimes (i 10000000)
((mm i) (lambda (x y z)
(+ i x y z)))))
(define (test-vv)
(dotimes (i 10000000)
(receive (x y z) (vv i)
(+ i x y z))))
1.
Welcome to Scheme 48 1.3 (made by soo on Mon Jul 11 09:51:36 2005)
Copyright (c) 1993-2005 by Richard Kelsey and Jonathan Rees.
Please report bugs to xxxxxx@s48.org.
Get more information at http://www.s48.org/.
Type ,? (comma question-mark) for help.
> ,time (test-m)
Run time: 37.96 seconds; Elapsed time: 40.22 seconds
#t
> ,time (test-v)
Run time: 51.44 seconds; Elapsed time: 55.49 seconds
#t
> ,time (test-mm)
Run time: 57.45 seconds; Elapsed time: 59.66 seconds
#t
> ,time (test-vv)
Run time: 66.79 seconds; Elapsed time: 69.29 seconds
#t
2.
Welcome to MzScheme version 299.100, Copyright (c) 2004-2005 PLT Scheme, Inc.
> (time (test-m))
cpu time: 30314 real time: 32607 gc time: 6495
> (time (test-v))
cpu time: 53206 real time: 58714 gc time: 20065
> (time (test-mm))
cpu time: 41039 real time: 43092 gc time: 6681
> (time (test-vv))
cpu time: 62870 real time: 65364 gc time: 20040
3.
Version 1, Build 66 - linux-unix-gnu-x86
(c)2000-2004 Felix L. Winkelmann
Using hygienic macros
; loading /usr/local/share/chicken/chicken-highlevel-macros.scm ...
#;13> (time (test-m))
127.11 seconds elapsed
71.13 seconds in (major) GC
1 mutations
50 minor GCs
4169 major GCs
#;14> (time (test-v))
163.51 seconds elapsed
92.9 seconds in (major) GC
1 mutations
55 minor GCs
5306 major GCs
#;15> (time (test-mm))
211.26 seconds elapsed
126.54 seconds in (major) GC
1 mutations
37 minor GCs
7387 major GCs
#;16> (time (test-vv))
225.75 seconds elapsed
131.63 seconds in (major) GC
1 mutations
23 minor GCs
7687 major GCs
I wonder why my results are different from yours.
--
Joo ChurlSoo