Re: Proposal to reduce the number of argument to array-for-each, array-every, and array-any Bradley Lucier 06 Dec 2025 23:47 UTC

On 12/6/25 11:13, Bradley Lucier wrote:
> So, so far, the abstraction levels inherent in the array routines cost a
> factor of 5 in runtime.

I've implemented your idea for array-for-each and array-fold-left for
packed specialized arrays, and with the previous definitions of
dot-product-*, given again below for reference, the timings are now:

 > (define domain (make-interval '#(1000 1000)))
 > (define a (make-specialized-array domain f64-storage-class 1.))
 > (define b (make-specialized-array domain f64-storage-class 2.))
 > (time (dot-product a b))
(time (dot-product a b))
     0.041193 secs real time
     0.041135 secs cpu time (0.041132 user, 0.000003 system)
     no collections
     752 bytes allocated
     2 minor faults
     no major faults
     147913824 cpu cycles
2000000.
 > (time (dot-product-2  a b))
(time (dot-product-2 a b))
     0.017274 secs real time
     0.017243 secs cpu time (0.017243 user, 0.000000 system)
     no collections
     624 bytes allocated
     no minor faults
     no major faults
     62010342 cpu cycles
2000000.
 > (time (dot-product-3  a b))
(time (dot-product-3 a b))
     0.017645 secs real time
     0.017579 secs cpu time (0.017579 user, 0.000000 system)
     no collections
     448 bytes allocated
     no minor faults
     no major faults
     63338346 cpu cycles
2000000.
 > (time (dot-product-4  a b))
(time (dot-product-4 a b))
     0.005075 secs real time
     0.005074 secs cpu time (0.005071 user, 0.000003 system)
     no collections
     64 bytes allocated
     no minor faults
     no major faults
     18190851 cpu cycles
2000000.
 > (time (dot-product-5  a b))
(time (dot-product-5 a b))
     0.004977 secs real time
     0.004960 secs cpu time (0.004959 user, 0.000001 system)
     no collections
     96 bytes allocated
     no minor faults
     no major faults
     17818125 cpu cycles
2000000.

So the overhead for dot-product-{2|3} is now about a factor of three
over code the compiles directly to inlined C code operating on f64vector
elements directly.

Brad

(declare (standard-bindings)
          (extended-bindings)
          (block)
          (not safe))

(define (dot-product a b)
   (array-fold-left fl+ 0. (array-map fl* a b)))

(define (dot-product-2 a b)
   (let ((sum 0.))
     (array-for-each (lambda (ai bi) (set! sum (fl+ sum (fl* ai bi)))) a b)
     sum))

(define (dot-product-3 a b)
   (array-fold-left (lambda (sum a b) (fl+ sum (fl* a b))) 0. a b))

(define (dot-product-4 a b)
   (let ((N (interval-volume (array-domain a)))
         (a-body (array-body a))
         (b-body (array-body b)))
     (do ((i 0 (fx+ i 1))
          (sum 0. (fl+ sum (fl* (f64vector-ref a-body i)
                                (f64vector-ref b-body i)))))
         ((fx= i N) sum))))

(define (dot-product-5 a b)
   (let ((N (interval-volume (array-domain a)))
         (a-body (array-body a))
         (b-body (array-body b))
         (sum (f64vector 0.)))
     (do ((i 0 (fx+ i 1)))
         ((fx= i N) (f64vector-ref sum 0))
       (f64vector-set! sum 0 (fl+ (f64vector-ref sum 0)
                                  (fl* (f64vector-ref a-body i)
                                       (f64vector-ref b-body i)))))))