Bulk copying is a lot faster than element-by-element copying in one example. Bradley Lucier 03 May 2020 20:18 UTC

A little over a week ago I implemented block copying for specialized
arrays whose elements are in order and adjacent in memory.

The code uses the various @vector-copy! routines from R7RS, which for
Gambit I defined using xxxxxx@vector-move! routines, which, at bottom, use
memmove.

So today I compiled generic-arrays.scm and the following test code
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(define a (make-specialized-array (make-interval '#(4 10000 10000))
                                   u32-storage-class
                                   #f))

(define b (make-specialized-array (make-interval '#(10000 10000))
                                   u32-storage-class
                                   #f))

;;; Access a through elements of dimension 10000 x 10000.

(define curried-a (array-curry a 2))

;;; Assign the first 10000 x 10000 subarray of a to b
;;; using bulk copy. (400,000,000 bytes)

(time (array-assign! b ((array-getter curried-a) 0)))

;;; Set d to a general array accessing the second 10000 x 10000
;;; subarray of a.

(define d (let ((d_ (array-getter ((array-getter curried-a) 1))))
             (make-array (make-interval '#(10000 10000))
                         d_)))

;;; Assign d to b using element-by-element copy

(time (array-assign! b d))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The times are

(load "assign-time-test")
(time (array-assign! b ((array-getter curried-a) 0)))
     0.039606 secs real time
     0.039588 secs cpu time (0.039588 user, 0.000000 system)
     no collections
     896 bytes allocated
     1 minor fault
     no major faults
(time (array-assign! b d))
     1.947715 secs real time
     1.947511 secs cpu time (1.947511 user, 0.000000 system)
     no collections
     64 bytes allocated
     no minor faults
     no major faults

As expected, the difference is significant: the block move code
transfers 10 GB/second on my seven-year-old Linux box, while the
element-by-element code transfers about 200 MB/second.

Brad