Re: Lexicographical access to arrays

Show/hide message thread

Lexicographical access to arrays John Cowan (09 Apr 2026 18:59 UTC)

Re: Lexicographical access to arrays Per Bothner (09 Apr 2026 19:47 UTC)

Re: Lexicographical access to arrays Alex Shinn (09 Apr 2026 23:48 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 00:22 UTC)

Re: Lexicographical access to arrays Per Bothner (10 Apr 2026 01:48 UTC)

Re: Lexicographical access to arrays John Cowan (10 Apr 2026 05:52 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 15:33 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 15:36 UTC)

Re: Lexicographical access to arrays Bradley Lucier (11 Apr 2026 00:32 UTC)

Re: Lexicographical access to arrays John Cowan (11 Apr 2026 04:03 UTC)

Re: Lexicographical access to arrays Bradley Lucier (11 Apr 2026 17:37 UTC)

Re: Lexicographical access to arrays Bradley Lucier 11 Apr 2026 00:32 UTC

On 4/10/26 01:52, John Cowan wrote:
> It would be fine if this only worked for specialized arrays, where the
> elements*are* contiguous.  In that case you don't have to do all the
> divisions and remainders: you look down at the storage-class body and
> work from that.

I've come up with the following example, which shows how to exploit
knowledge that the arrays are packed and you know the storage class.  It
shows a speedup of a factor of > 10 compared to the general case when
compiled.

Would adding this to the document as an extended example under
array-packed? be what you'd like?

Brad

(declare
   (standard-bindings)
   (extended-bindings)
   (block)
   (not safe))

(define (array-dot-product A B)
   ;; Assumes that A and B have the same domain
   ;; and that the elements of A and B are flonums
   (if (and (specialized-array? A)
            (specialized-array? B)
            (eq? (array-storage-class A)
                 f64-storage-class)
            (eq? (array-storage-class B)
                 f64-storage-class)
            (array-packed? A)
            (array-packed? B))
       ;;; Exploit that the elements of A and B are
       ;;; stored in consecutive elements of f64vectors
       (let* ((domain (array-domain A))
              (lowers (interval-lower-bounds->list domain))
              (A-base (apply (array-indexer A) lowers))
              (B-base (apply (array-indexer B) lowers))
              (A-body (array-body A))
              (B-body (array-body B))
              (elements-left (interval-volume domain)))
         (do ((i A-base (fx+ i 1))
              (j B-base (fx+ j 1))
              (sum 0. (fl+ sum (fl* (f64vector-ref A-body i)
                                    (f64vector-ref B-body j))))
              (elements-left elements-left (fx- elements-left 1)))
             ((eqv? elements-left 0) sum)))
       ;;; Code for the general case
       (array-fold-left (lambda (sum a b)
                          (fl+ sum (fl* a b)))
                        0.
                        A B)))

;;; A 1000 x 1000 array of random reals between 0 and 1
(define A (array-copy (make-array (make-interval '#(1000 1000))
                                   (lambda (i j) (random-real)))
                       f64-storage-class))

;;; Copy the transpose of A

(define B (array-copy (array-permute A '#(1 0))))

;;; Take the dot product of the top and bottom halves of A, which
;;; are both packed.

(pretty-print
  (time
   (array-dot-product
    (array-extract A (make-interval '#(500 1000)))
    (array-rebase (array-extract A (make-interval '#((500 1000) 1000)))))))

;;; Take the dot product of the left and right halves of B, which
;;; are not packed. This should give the same numerical result
;;; to within a multiple of rounding error

(pretty-print
  (time
   (array-dot-product
    (array-extract B (make-interval '#(1000 500)))
    (array-rebase (array-extract B (make-interval '#(1000 (500 1000))))))))

;;; Yields

(time (array-dot-product (array-extract A (make-interval '#(500 1000)))
(array-rebase (array-extract A (make-interval '#((500 1000) 1000))))))
     0.001274 secs real time
     0.001272 secs cpu time (0.001250 user, 0.000022 system)
     no collections
     8240 bytes allocated
     no minor faults
     no major faults
     4560612 cpu cycles
124739.53368039725
(time (array-dot-product (array-extract B (make-interval '#(1000 500)))
(array-rebase (array-extract B (make-interval '#(1000 (500 1000)))))))
     0.015287 secs real time
     0.015170 secs cpu time (0.014963 user, 0.000207 system)
     no collections
     8944 bytes allocated
     no minor faults
     no major faults
     54884724 cpu cycles
124739.53368039505