comments Jeffrey Mark Siskind (24 Apr 2020 18:59 UTC)
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:53 UTC)
|
Re: comments
Bradley Lucier
(17 May 2020 21:40 UTC)
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:54 UTC)
|
Re: comments
John Cowan
(24 Apr 2020 21:13 UTC)
|
Re: comments
Bradley Lucier
(25 Apr 2020 23:34 UTC)
|
Re: comments
Bradley Lucier
(26 Apr 2020 00:09 UTC)
|
Re: comments
John Cowan
(26 Apr 2020 03:46 UTC)
|
Re: comments
Bradley Lucier
(28 Apr 2020 20:03 UTC)
|
Re: comments
Bradley Lucier
(26 Apr 2020 22:11 UTC)
|
I haven't had the time to go throught the SRFI in detail. Hoever, I have several very-high-level comments. 1. From a historical perspective, the MIT Lisp Machine had displaced arrays. All of the derivatives (Symbolics, LMI, and TI) thus also did. I think that Interlisp-D did as well. I don't remember about Maclisp. Displaced arrays were used in the Lisp Machines (both the MIT variants and I think also the PARC variants) to implement the window system. Windows were displaced arrays into the hardware screen buffer. Alan Bawden just copied this idea into Scheme. He was one of the MIT Lisp Machine developers. 2. I think a very important design goal of any array system for Scheme should be to fully support not only the functionality of systems like (Py)Torch and cuDNN but also the actual code base. (Py)Torch has an API for multidimensional arrays that has become the defacto standard for deep learning and GPUs. It is similar in many ways to the proposed SRFI. I haven't had time to check in detail, but it would be good if it were 100% compatible. So that in a Scheme implementation with a suitable FFI, one could make bindings for the entire C backend to Torch (now replaced with ATen) and all of cuDNN. Inter alia, this means support for GPU resident arrays as well as CPU resident arrays. (You also need to be able to support residency on different GPUs as well as migration between GPUs and between GPUs and the CPU.) Also, Torch tensors [sic] aka arrays support the notion of "stride" in addition to lower and upper bounds. This allows downsampling/decimation through descriptors without copying. It also allows reversal through negative strides. I haven't thought deeply enough about whether the SRFI framework can support this. 3. One of the things about GPUs is that they support a variety of different models of fold aka reduce, some of which are deterministic in their parallelism and some of which are not. Because of the nonassociativity of floating point addition this may make results nondeterministic. Sometimes people tolerate this because of faster speed. Sometimes not. Frameworks have ways of specifying whether or not you require deterministic results. Similarly, some frameworks, like cuDNN, have a single API for things like convolution, but take an argument that specifies an algorithm to use that trades off time vs. (temporary) space. 4. The (Py)Torch tensor model distinguishes between contiguous and noncontiguous tensors and has mechanisms for producing a contiguous tensor from an noncontiguous one by copying. Some operations require contiguous tensors or are more efficient on such. Some implicitly convert to contiguous. There is an efficiency hack. Suppose you to a map + over two tensors with compatible dimensions. Instead of doing it through the descriptors you can do it directly on the underlying 1D storage and just wrap the result in the appropriate descriptor. So that underlying map is a simple low-lever 1D map instead of a map of higher oder through descriptors. Just a side comment. In my lab, we have a language and implementation called checkpointVLAD which is a slight variant of a pure subset of Scheme. We made a version of checkpointVLAD called Scorch (Scheme Torch, scorching hot) that has tesnor types compatible with Torch and has a complete set of basis functions for manipulating them in a way that is compatible with both Scheme and Torch, along with a set of FFI bindings to Torch and cuDNN. Jeff (http://engineering.purdue.edu/~qobi)