|
Lexicographical access to arrays
John Cowan
(09 Apr 2026 18:59 UTC)
|
|
Re: Lexicographical access to arrays
Per Bothner
(09 Apr 2026 19:47 UTC)
|
|
Re: Lexicographical access to arrays
Alex Shinn
(09 Apr 2026 23:48 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 00:22 UTC)
|
|
Re: Lexicographical access to arrays
Per Bothner
(10 Apr 2026 01:48 UTC)
|
|
Re: Lexicographical access to arrays
John Cowan
(10 Apr 2026 05:52 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 15:33 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 15:36 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(11 Apr 2026 00:32 UTC)
|
|
Re: Lexicographical access to arrays John Cowan (11 Apr 2026 04:03 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(11 Apr 2026 17:37 UTC)
|
LGTM. On Fri, Apr 10, 2026 at 8:32 PM Bradley Lucier <xxxxxx@purdue.edu> wrote: > > On 4/10/26 01:52, John Cowan wrote: > > It would be fine if this only worked for specialized arrays, where the > > elements*are* contiguous. In that case you don't have to do all the > > divisions and remainders: you look down at the storage-class body and > > work from that. > > I've come up with the following example, which shows how to exploit > knowledge that the arrays are packed and you know the storage class. It > shows a speedup of a factor of > 10 compared to the general case when > compiled. > > Would adding this to the document as an extended example under > array-packed? be what you'd like? > > Brad > > (declare > (standard-bindings) > (extended-bindings) > (block) > (not safe)) > > (define (array-dot-product A B) > ;; Assumes that A and B have the same domain > ;; and that the elements of A and B are flonums > (if (and (specialized-array? A) > (specialized-array? B) > (eq? (array-storage-class A) > f64-storage-class) > (eq? (array-storage-class B) > f64-storage-class) > (array-packed? A) > (array-packed? B)) > ;;; Exploit that the elements of A and B are > ;;; stored in consecutive elements of f64vectors > (let* ((domain (array-domain A)) > (lowers (interval-lower-bounds->list domain)) > (A-base (apply (array-indexer A) lowers)) > (B-base (apply (array-indexer B) lowers)) > (A-body (array-body A)) > (B-body (array-body B)) > (elements-left (interval-volume domain))) > (do ((i A-base (fx+ i 1)) > (j B-base (fx+ j 1)) > (sum 0. (fl+ sum (fl* (f64vector-ref A-body i) > (f64vector-ref B-body j)))) > (elements-left elements-left (fx- elements-left 1))) > ((eqv? elements-left 0) sum))) > ;;; Code for the general case > (array-fold-left (lambda (sum a b) > (fl+ sum (fl* a b))) > 0. > A B))) > > > ;;; A 1000 x 1000 array of random reals between 0 and 1 > (define A (array-copy (make-array (make-interval '#(1000 1000)) > (lambda (i j) (random-real))) > f64-storage-class)) > > ;;; Copy the transpose of A > > (define B (array-copy (array-permute A '#(1 0)))) > > ;;; Take the dot product of the top and bottom halves of A, which > ;;; are both packed. > > (pretty-print > (time > (array-dot-product > (array-extract A (make-interval '#(500 1000))) > (array-rebase (array-extract A (make-interval '#((500 1000) 1000))))))) > > ;;; Take the dot product of the left and right halves of B, which > ;;; are not packed. This should give the same numerical result > ;;; to within a multiple of rounding error > > (pretty-print > (time > (array-dot-product > (array-extract B (make-interval '#(1000 500))) > (array-rebase (array-extract B (make-interval '#(1000 (500 1000)))))))) > > ;;; Yields > > (time (array-dot-product (array-extract A (make-interval '#(500 1000))) > (array-rebase (array-extract A (make-interval '#((500 1000) 1000)))))) > 0.001274 secs real time > 0.001272 secs cpu time (0.001250 user, 0.000022 system) > no collections > 8240 bytes allocated > no minor faults > no major faults > 4560612 cpu cycles > 124739.53368039725 > (time (array-dot-product (array-extract B (make-interval '#(1000 500))) > (array-rebase (array-extract B (make-interval '#(1000 (500 1000))))))) > 0.015287 secs real time > 0.015170 secs cpu time (0.014963 user, 0.000207 system) > no collections > 8944 bytes allocated > no minor faults > no major faults > 54884724 cpu cycles > 124739.53368039505 >