comments
Jeffrey Mark Siskind
(24 Apr 2020 18:59 UTC)
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:53 UTC)
|
Re: comments
Bradley Lucier
(17 May 2020 21:40 UTC)
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:54 UTC)
|
Re: comments
John Cowan
(24 Apr 2020 21:13 UTC)
|
Re: comments
Bradley Lucier
(25 Apr 2020 23:34 UTC)
|
Re: comments Bradley Lucier (26 Apr 2020 00:09 UTC)
|
Re: comments
John Cowan
(26 Apr 2020 03:46 UTC)
|
Re: comments
Bradley Lucier
(28 Apr 2020 20:03 UTC)
|
Re: comments
Bradley Lucier
(26 Apr 2020 22:11 UTC)
|
On 4/24/20 2:58 PM, Jeffrey Mark Siskind wrote: > I haven't had the time to go throught the SRFI in detail. Hoever, I have several > very-high-level comments. > > 1. From a historical perspective, the MIT Lisp Machine had displaced arrays. > All of the derivatives (Symbolics, LMI, and TI) thus also did. I think > that Interlisp-D did as well. I don't remember about Maclisp. Displaced > arrays were used in the Lisp Machines (both the MIT variants and > I think also the PARC variants) to implement the window system. Windows > were displaced arrays into the hardware screen buffer. Alan Bawden just > copied this idea into Scheme. He was one of the MIT Lisp Machine developers. I don't know CL, and I'm not an expert on the history of Lisp, either. That being said, I looked at what a displaced array is in the CLHS, and it seems to be less general than the affine maps between multi-indices that Bawden proposed, just a linear offset into the storage of another array. On the other hand, the relationship between the multi-indices of a displaced array and the multi-indices of the original array may very well not be affine, so it doesn't seem to have the compositional mathematical structure that affine maps have. > 2. I think a very important design goal of any array system for Scheme should > be to fully support not only the functionality of systems like (Py)Torch > and cuDNN but also the actual code base. (Py)Torch has an API for > multidimensional arrays that has become the defacto standard for deep > learning and GPUs. It is similar in many ways to the proposed SRFI. I > haven't had time to check in detail, but it would be good if it were 100% > compatible. So that in a Scheme implementation with a suitable FFI, one > could make bindings for the entire C backend to Torch (now replaced with > ATen) and all of cuDNN. Inter alia, this means support for GPU resident > arrays as well as CPU resident arrays. (You also need to be able to > support residency on different GPUs as well as migration between GPUs and > between GPUs and the CPU.) Also, Torch tensors [sic] aka arrays support > the notion of "stride" in addition to lower and upper bounds. This allows > downsampling/decimation through descriptors without copying. It also > allows reversal through negative strides. I haven't thought deeply enough > about whether the SRFI framework can support this. About GPU resident arrays: I don't know how to specify this in Scheme. About "stride": This SRFI has array-sample, I'll see how it compares to "stride". About reversals: This SRFI has array-reverse, I'll see how it compares to Torch. > 3. One of the things about GPUs is that they support a variety of different > models of fold aka reduce, some of which are deterministic in their > parallelism and some of which are not. Because of the nonassociativity of > floating point addition this may make results nondeterministic. Sometimes > people tolerate this because of faster speed. Sometimes not. Frameworks > have ways of specifying whether or not you require deterministic results. > Similarly, some frameworks, like cuDNN, have a single API for things like > convolution, but take an argument that specifies an algorithm to use that > trades off time vs. (temporary) space. An early version of this SRFI had array-reduce-serial (which processed items strictly in lexicographical order) and array-reduce (which offered no such guarantee). The "-serial" versions of functions have been removed---do you think this should be revisited? > 4. The (Py)Torch tensor model distinguishes between contiguous and > noncontiguous tensors and has mechanisms for producing a contiguous tensor > from an noncontiguous one by copying. Some operations require contiguous > tensors or are more efficient on such. Some implicitly convert to > contiguous. There is an efficiency hack. Suppose you to a map + over two > tensors with compatible dimensions. Instead of doing it through the > descriptors you can do it directly on the underlying 1D storage and just > wrap the result in the appropriate descriptor. So that underlying map is a > simple low-lever 1D map instead of a map of higher oder through > descriptors. The SRFI has array->specialized-array, which produces an array with a backing store where the elements are contiguous and in order, from a possibly noncontiguous, or even functional, array. For operations to work on underlying 1D storage rather than go through indices, then one would need some way to know that elements are contiguous and in order, and to know the offset of the first array element in the underlying 1D storage. There must be a relatively easy mathematical test for this information. The only question is whether this information should be calculated when each specialized array is created (it might be shared with another specialized array, e.g., by array-curry) or generated when it is needed. Brad