Re: Proposal to reduce the number of argument to array-for-each, array-every, and array-any
Bradley Lucier 06 Dec 2025 16:13 UTC
On 12/5/25 22:11, Bradley Lucier wrote:
> On 12/5/25 21:43, Bradley Lucier wrote:
>> Here are some times:
>
> I forgot to say that the arrays are 1000 x 1000 f64 arrays:
I added two more routines to the test, which should indicate just about
how fast things can possibly go in Gambit:
(define (dot-product-4 a b)
(let ((N (interval-volume (array-domain a)))
(a-body (array-body a))
(b-body (array-body b)))
(do ((i 0 (fx+ i 1))
(sum 0. (fl+ sum (fl* (f64vector-ref a-body i)
(f64vector-ref b-body i)))))
((fx= i N) sum))))
(define (dot-product-5 a b)
(let ((N (interval-volume (array-domain a)))
(a-body (array-body a))
(b-body (array-body b))
(sum (f64vector 0.)))
(do ((i 0 (fx+ i 1)))
((fx= i N) (f64vector-ref sum 0))
(f64vector-set! sum 0 (fl+ (f64vector-ref sum 0)
(fl* (f64vector-ref a-body i)
(f64vector-ref b-body i)))))))
dot-product-4 doesn't box flonums in memory, but it does do some
bit-shifting, etc., to transform flonums from their native format to
Gambit's "immediate flonum" format (which is itself a form of boxing).
dot-product-5 just moves flonums to and from f64vectors, so there is no
boxing of any kind.
Here are times, with the "important" part of each routine. All routines
except dot-product-3 allocate a most a few hundred bytes of memory.
dot-product:
(array-fold-left fl+ 0. (array-map fl* a b)))
0.045931 secs cpu time (0.045816 user, 0.000115 system)
dot-product-2:
(let ((sum 0.))
(array-for-each (lambda (ai bi) (set! sum (fl+ sum (fl* ai bi)))) a b)
sum))
exploiting specialized and packed:
0.020274 secs cpu time (0.020209 user, 0.000065 system)
not exploiting specialized and packed:
0.030133 secs cpu time (0.030097 user, 0.000036 system)
dot-product-3:
(array-fold-left (lambda (sum a b) (fl+ sum (fl* a b))) 0. a b))
0.070206 secs cpu time (0.055341 user, 0.014865 system)
96000560 bytes allocated
dot-product-4:
(do ((i 0 (fx+ i 1))
(sum 0. (fl+ sum (fl* (f64vector-ref a-body i)
(f64vector-ref b-body i)))))
((fx= i N) sum))))
0.006233 secs cpu time (0.006192 user, 0.000041 system)
dot-product-5:
(sum (f64vector 0.)))
(do ((i 0 (fx+ i 1)))
((fx= i N) (f64vector-ref sum 0))
(f64vector-set! sum 0 (fl+ (f64vector-ref sum 0)
(fl* (f64vector-ref a-body i)
(f64vector-ref b-body i)))))))
0.004989 secs cpu time (0.004988 user, 0.000001 system)
I think dot-product-3 is expressed most "naturally", but it's
unfortunate that the sample implementation unnecessarily allocates so
much memory.
So, so far, the abstraction levels inherent in the array routines cost a
factor of 5 in runtime.
Brad