Directly manipulating representations of scheme objects in arrays

Show/hide message thread

Directly manipulating representations of scheme objects in arrays Bradley Lucier (16 Mar 2023 22:59 UTC)

Re: Directly manipulating representations of scheme objects in arrays John Cowan (17 Mar 2023 14:09 UTC)

Re: Directly manipulating representations of scheme objects in arrays Bradley Lucier (17 Mar 2023 16:15 UTC)

Re: Directly manipulating representations of scheme objects in arrays John Cowan (17 Mar 2023 16:56 UTC)

Directly manipulating representations of scheme objects in arrays Bradley Lucier 16 Mar 2023 22:59 UTC

This recent discussion of f16 and f8 storage classes has exposed some
muddled thinking on my part about moving array elements directly between
arrays with the same storage class.

I can't see how these issues can be resolved within the framework of
SRFI 231, but I'd still like to discuss them here.

In the sample implementation all code for moving elements between arrays
is collected into the procedure %%move-array-elements.

If both arrays are specialized and elements of the arrays are stored
adjacently in the backing store and indexed in increasing order (i.e.,
the array is "packed"), and if the storage class has a "copier" (like
vector-copy!, etc.), then that copier is used to move elements between
arrays.  Implicitly, it's moving *representations* of elements between
arrays.

If the arrays are specialized with the same storage class but the
destination is not "packed" then the code looks like this ("getter" is
(array-getter source), "setter" is (array-setter destination)):

                       (%%interval-for-each
                        (case (%%interval-dimension domain)
                          ((0) (lambda ()
                                 (setter (getter))))
                          ((1) (lambda (i)
                                 (setter (getter i) i)))
                          ((2) (lambda (i j)
                                 (setter (getter i j) i j)))
                          ((3) (lambda (i j k)
                                 (setter (getter i j k) i j k)))
                          ((4) (lambda (i j k l)
                                 (setter (getter i j k l) i j k l)))
                          (else
                           (lambda multi-index
                             (apply setter (apply getter multi-index)
multi-index))))
                        domain)

This has a different meaning: The getter converts the representation of
a scheme object to the object itself, while the setter converts the
scheme object back to its representation, which is stored in the backing
store of the array.

Usually, we don't mind these conversions too much, and I'll talk about
the costs in Gambit on a 64-bit machine: a u16vector element is
converted to a fixnum by the getter, and converted back by the setter.

Sometimes it has more of a cost: A f32vector element is converted to a
Scheme double (16 bytes) and then stored back into the new backing
store.  So it costs some garbage generation and collection, but the
conversion is usually done in hardware.

The same thing happens with u64-storage-class arrays: The getter
converts to a exact integer (usually a bignum taking 16 bytes, but it
could take 24 bytes if the top bit is set, or it could be a fixnum if
it's < 62 bits long), the setter stores it back into the u64vector.
These conversions are not done in hardware and they take noticeable CPU
time as well as generating significant memory use.  On a 32-bit machine
the same thing can happen with u32-storage-class arrays.

When the two specialized arrays have different storage classes, this
back and forth conversion is necessary, converting a u1vector element to
a fixnum to store into a u8vector, say, as an example.

But adding f16-storage-class to this framework makes the issue more
clear cut.  The representation of the floating-point number can fit in a
fixnum, but the floating-point number itself takes 16 bytes and
significant computation in Gambit.  Moving the 16-bit representation
between two f16-storage-class is much more efficient than converting to
the double float Scheme object and then converting back again.

So (at least some) storage classes should have a way of loading and
storing the representation of the data directly, instead of using the
Scheme object as an intermediary.

Brad