Re: Lexicographical access to arrays

Show/hide message thread

Lexicographical access to arrays John Cowan (09 Apr 2026 18:59 UTC)

Re: Lexicographical access to arrays Per Bothner (09 Apr 2026 19:47 UTC)

Re: Lexicographical access to arrays Alex Shinn (09 Apr 2026 23:48 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 00:22 UTC)

Re: Lexicographical access to arrays Per Bothner (10 Apr 2026 01:48 UTC)

Re: Lexicographical access to arrays John Cowan (10 Apr 2026 05:52 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 15:33 UTC)

Re: Lexicographical access to arrays Bradley Lucier (10 Apr 2026 15:36 UTC)

Re: Lexicographical access to arrays Bradley Lucier (11 Apr 2026 00:32 UTC)

Re: Lexicographical access to arrays John Cowan (11 Apr 2026 04:03 UTC)

Re: Lexicographical access to arrays Bradley Lucier (11 Apr 2026 17:37 UTC)

Re: Lexicographical access to arrays John Cowan 11 Apr 2026 04:03 UTC

LGTM.

On Fri, Apr 10, 2026 at 8:32 PM Bradley Lucier <xxxxxx@purdue.edu> wrote:
>
> On 4/10/26 01:52, John Cowan wrote:
> > It would be fine if this only worked for specialized arrays, where the
> > elements*are* contiguous.  In that case you don't have to do all the
> > divisions and remainders: you look down at the storage-class body and
> > work from that.
>
> I've come up with the following example, which shows how to exploit
> knowledge that the arrays are packed and you know the storage class.  It
> shows a speedup of a factor of > 10 compared to the general case when
> compiled.
>
> Would adding this to the document as an extended example under
> array-packed? be what you'd like?
>
> Brad
>
> (declare
>    (standard-bindings)
>    (extended-bindings)
>    (block)
>    (not safe))
>
> (define (array-dot-product A B)
>    ;; Assumes that A and B have the same domain
>    ;; and that the elements of A and B are flonums
>    (if (and (specialized-array? A)
>             (specialized-array? B)
>             (eq? (array-storage-class A)
>                  f64-storage-class)
>             (eq? (array-storage-class B)
>                  f64-storage-class)
>             (array-packed? A)
>             (array-packed? B))
>        ;;; Exploit that the elements of A and B are
>        ;;; stored in consecutive elements of f64vectors
>        (let* ((domain (array-domain A))
>               (lowers (interval-lower-bounds->list domain))
>               (A-base (apply (array-indexer A) lowers))
>               (B-base (apply (array-indexer B) lowers))
>               (A-body (array-body A))
>               (B-body (array-body B))
>               (elements-left (interval-volume domain)))
>          (do ((i A-base (fx+ i 1))
>               (j B-base (fx+ j 1))
>               (sum 0. (fl+ sum (fl* (f64vector-ref A-body i)
>                                     (f64vector-ref B-body j))))
>               (elements-left elements-left (fx- elements-left 1)))
>              ((eqv? elements-left 0) sum)))
>        ;;; Code for the general case
>        (array-fold-left (lambda (sum a b)
>                           (fl+ sum (fl* a b)))
>                         0.
>                         A B)))
>
>
> ;;; A 1000 x 1000 array of random reals between 0 and 1
> (define A (array-copy (make-array (make-interval '#(1000 1000))
>                                    (lambda (i j) (random-real)))
>                        f64-storage-class))
>
> ;;; Copy the transpose of A
>
> (define B (array-copy (array-permute A '#(1 0))))
>
> ;;; Take the dot product of the top and bottom halves of A, which
> ;;; are both packed.
>
> (pretty-print
>   (time
>    (array-dot-product
>     (array-extract A (make-interval '#(500 1000)))
>     (array-rebase (array-extract A (make-interval '#((500 1000) 1000)))))))
>
> ;;; Take the dot product of the left and right halves of B, which
> ;;; are not packed. This should give the same numerical result
> ;;; to within a multiple of rounding error
>
> (pretty-print
>   (time
>    (array-dot-product
>     (array-extract B (make-interval '#(1000 500)))
>     (array-rebase (array-extract B (make-interval '#(1000 (500 1000))))))))
>
> ;;; Yields
>
> (time (array-dot-product (array-extract A (make-interval '#(500 1000)))
> (array-rebase (array-extract A (make-interval '#((500 1000) 1000))))))
>      0.001274 secs real time
>      0.001272 secs cpu time (0.001250 user, 0.000022 system)
>      no collections
>      8240 bytes allocated
>      no minor faults
>      no major faults
>      4560612 cpu cycles
> 124739.53368039725
> (time (array-dot-product (array-extract B (make-interval '#(1000 500)))
> (array-rebase (array-extract B (make-interval '#(1000 (500 1000)))))))
>      0.015287 secs real time
>      0.015170 secs cpu time (0.014963 user, 0.000207 system)
>      no collections
>      8944 bytes allocated
>      no minor faults
>      no major faults
>      54884724 cpu cycles
> 124739.53368039505
>