Re: Proposal to reduce the number of argument to array-for-each, array-every, and array-any Bradley Lucier 06 Dec 2025 16:13 UTC

On 12/5/25 22:11, Bradley Lucier wrote:
> On 12/5/25 21:43, Bradley Lucier wrote:
>> Here are some times:
>
> I forgot to say that the arrays are 1000 x 1000 f64 arrays:

I added two more routines to the test, which should indicate just about
how fast things can possibly go in Gambit:

(define (dot-product-4 a b)
   (let ((N (interval-volume (array-domain a)))
         (a-body (array-body a))
         (b-body (array-body b)))
     (do ((i 0 (fx+ i 1))
          (sum 0. (fl+ sum (fl* (f64vector-ref a-body i)
                                (f64vector-ref b-body i)))))
         ((fx= i N) sum))))

(define (dot-product-5 a b)
   (let ((N (interval-volume (array-domain a)))
         (a-body (array-body a))
         (b-body (array-body b))
         (sum (f64vector 0.)))
     (do ((i 0 (fx+ i 1)))
         ((fx= i N) (f64vector-ref sum 0))
       (f64vector-set! sum 0 (fl+ (f64vector-ref sum 0)
                                  (fl* (f64vector-ref a-body i)
                                       (f64vector-ref b-body i)))))))

dot-product-4 doesn't box flonums in memory, but it does do some
bit-shifting, etc., to transform flonums from their native format to
Gambit's "immediate flonum" format (which is itself a form of boxing).

dot-product-5 just moves flonums to and from f64vectors, so there is no
boxing of any kind.

Here are times, with the "important" part of each routine.  All routines
except dot-product-3 allocate a most a few hundred bytes of memory.

dot-product:
   (array-fold-left fl+ 0. (array-map fl* a b)))
     0.045931 secs cpu time (0.045816 user, 0.000115 system)

dot-product-2:
   (let ((sum 0.))
     (array-for-each (lambda (ai bi) (set! sum (fl+ sum (fl* ai bi)))) a b)
     sum))
exploiting specialized and packed:
     0.020274 secs cpu time (0.020209 user, 0.000065 system)
not exploiting specialized and packed:
     0.030133 secs cpu time (0.030097 user, 0.000036 system)

dot-product-3:
   (array-fold-left (lambda (sum a b) (fl+ sum (fl* a b))) 0. a b))
     0.070206 secs cpu time (0.055341 user, 0.014865 system)
     96000560 bytes allocated

dot-product-4:
     (do ((i 0 (fx+ i 1))
          (sum 0. (fl+ sum (fl* (f64vector-ref a-body i)
                                (f64vector-ref b-body i)))))
         ((fx= i N) sum))))
     0.006233 secs cpu time (0.006192 user, 0.000041 system)

dot-product-5:
         (sum (f64vector 0.)))
     (do ((i 0 (fx+ i 1)))
         ((fx= i N) (f64vector-ref sum 0))
       (f64vector-set! sum 0 (fl+ (f64vector-ref sum 0)
                                  (fl* (f64vector-ref a-body i)
                                       (f64vector-ref b-body i)))))))
     0.004989 secs cpu time (0.004988 user, 0.000001 system)

I think dot-product-3 is expressed most "naturally", but it's
unfortunate that the sample implementation unnecessarily allocates so
much memory.

So, so far, the abstraction levels inherent in the array routines cost a
factor of 5 in runtime.

Brad