|
Lexicographical access to arrays
John Cowan
(09 Apr 2026 18:59 UTC)
|
|
Re: Lexicographical access to arrays
Per Bothner
(09 Apr 2026 19:47 UTC)
|
|
Re: Lexicographical access to arrays
Alex Shinn
(09 Apr 2026 23:48 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 00:22 UTC)
|
|
Re: Lexicographical access to arrays
Per Bothner
(10 Apr 2026 01:48 UTC)
|
|
Re: Lexicographical access to arrays
John Cowan
(10 Apr 2026 05:52 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 15:33 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(10 Apr 2026 15:36 UTC)
|
|
Re: Lexicographical access to arrays Bradley Lucier (11 Apr 2026 00:32 UTC)
|
|
Re: Lexicographical access to arrays
John Cowan
(11 Apr 2026 04:03 UTC)
|
|
Re: Lexicographical access to arrays
Bradley Lucier
(11 Apr 2026 17:37 UTC)
|
On 4/10/26 01:52, John Cowan wrote:
> It would be fine if this only worked for specialized arrays, where the
> elements*are* contiguous. In that case you don't have to do all the
> divisions and remainders: you look down at the storage-class body and
> work from that.
I've come up with the following example, which shows how to exploit
knowledge that the arrays are packed and you know the storage class. It
shows a speedup of a factor of > 10 compared to the general case when
compiled.
Would adding this to the document as an extended example under
array-packed? be what you'd like?
Brad
(declare
(standard-bindings)
(extended-bindings)
(block)
(not safe))
(define (array-dot-product A B)
;; Assumes that A and B have the same domain
;; and that the elements of A and B are flonums
(if (and (specialized-array? A)
(specialized-array? B)
(eq? (array-storage-class A)
f64-storage-class)
(eq? (array-storage-class B)
f64-storage-class)
(array-packed? A)
(array-packed? B))
;;; Exploit that the elements of A and B are
;;; stored in consecutive elements of f64vectors
(let* ((domain (array-domain A))
(lowers (interval-lower-bounds->list domain))
(A-base (apply (array-indexer A) lowers))
(B-base (apply (array-indexer B) lowers))
(A-body (array-body A))
(B-body (array-body B))
(elements-left (interval-volume domain)))
(do ((i A-base (fx+ i 1))
(j B-base (fx+ j 1))
(sum 0. (fl+ sum (fl* (f64vector-ref A-body i)
(f64vector-ref B-body j))))
(elements-left elements-left (fx- elements-left 1)))
((eqv? elements-left 0) sum)))
;;; Code for the general case
(array-fold-left (lambda (sum a b)
(fl+ sum (fl* a b)))
0.
A B)))
;;; A 1000 x 1000 array of random reals between 0 and 1
(define A (array-copy (make-array (make-interval '#(1000 1000))
(lambda (i j) (random-real)))
f64-storage-class))
;;; Copy the transpose of A
(define B (array-copy (array-permute A '#(1 0))))
;;; Take the dot product of the top and bottom halves of A, which
;;; are both packed.
(pretty-print
(time
(array-dot-product
(array-extract A (make-interval '#(500 1000)))
(array-rebase (array-extract A (make-interval '#((500 1000) 1000)))))))
;;; Take the dot product of the left and right halves of B, which
;;; are not packed. This should give the same numerical result
;;; to within a multiple of rounding error
(pretty-print
(time
(array-dot-product
(array-extract B (make-interval '#(1000 500)))
(array-rebase (array-extract B (make-interval '#(1000 (500 1000))))))))
;;; Yields
(time (array-dot-product (array-extract A (make-interval '#(500 1000)))
(array-rebase (array-extract A (make-interval '#((500 1000) 1000))))))
0.001274 secs real time
0.001272 secs cpu time (0.001250 user, 0.000022 system)
no collections
8240 bytes allocated
no minor faults
no major faults
4560612 cpu cycles
124739.53368039725
(time (array-dot-product (array-extract B (make-interval '#(1000 500)))
(array-rebase (array-extract B (make-interval '#(1000 (500 1000)))))))
0.015287 secs real time
0.015170 secs cpu time (0.014963 user, 0.000207 system)
no collections
8944 bytes allocated
no minor faults
no major faults
54884724 cpu cycles
124739.53368039505