Re: Discussion of array-{inner|outer}-product specification and implementation

Show/hide message thread

Discussion of array-{inner|outer}-product specification and implementation Bradley Lucier (16 Jan 2026 01:58 UTC)

Re: Discussion of array-{inner|outer}-product specification and implementation Bradley J Lucier (26 Jan 2026 03:58 UTC)

Re: Discussion of array-{inner|outer}-product specification and implementation Bradley J Lucier 26 Jan 2026 03:58 UTC

> On Jan 15, 2026, at 8:58 PM, Bradley Lucier <xxxxxx@purdue.edu> wrote:
>
> Studying this code leads me to reconsider the specifications of array-outer-product and array-inner-product in SRFI 231, which now seem a bit sloppy. Some things perhaps to copy from how matrix* is coded:

I've now updated the specification and implementation of array-inner-product and array-outer-product in my followup to SRFI 231: https://github.com/gambiteer/srfi-231/tree/231-bis?tab=readme-ov-file

Both of these routines almost always re-use each element of the array arguments multiple time, so generalized array arguments are copied to specialized arrays---we have no way to know how expensive it is to compute an element of a generalized array argument, so we want to compute and store each element once before proceeding. (Generally speaking, I prefer each "bulk" array operation to compute each element of argument arrays at most once.)

The routine array-outer-product returns a generalized array because the result is often "consumed" by another bulk operation before being copied or assigned to a specialized (strict) array (see LU matrix decomposition, for example).

The routine array-inner-product returns a specialized array because, generally, computing each element of the result involves a computationally intensive loop, so we do it only once for each element.

In the sample implementation, the inner loop of array-inner-product performs six nontail calls each iteration, so it is not particularly fast when the two procedural arguments are cheap (e.g., floating-point addition and multiplication), but I think the performance is good enough for a general routine. Users that need significantly higher performance can write custom loops in Scheme or use an FFI.

The documentation has also been (finally) updated.

I don’t have anything else on my agenda.  I’ve integrated array broadcasting into the SRFI, and made significant performance improvements in the sample implementation.  I’m going to let it sit for a while.

Comments welcome.

Brad