|
comments Jeffrey Mark Siskind (24 Apr 2020 18:59 UTC)
|
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:53 UTC)
|
|
Re: comments
Bradley Lucier
(17 May 2020 21:40 UTC)
|
|
Re: comments
Jeffrey Mark Siskind
(24 Apr 2020 19:54 UTC)
|
|
Re: comments
John Cowan
(24 Apr 2020 21:13 UTC)
|
|
Re: comments
Bradley Lucier
(25 Apr 2020 23:34 UTC)
|
|
Re: comments
Bradley Lucier
(26 Apr 2020 00:09 UTC)
|
|
Re: comments
John Cowan
(26 Apr 2020 03:46 UTC)
|
|
Re: comments
Bradley Lucier
(28 Apr 2020 20:03 UTC)
|
|
Re: comments
Bradley Lucier
(26 Apr 2020 22:11 UTC)
|
I haven't had the time to go throught the SRFI in detail. Hoever, I have several
very-high-level comments.
1. From a historical perspective, the MIT Lisp Machine had displaced arrays.
All of the derivatives (Symbolics, LMI, and TI) thus also did. I think
that Interlisp-D did as well. I don't remember about Maclisp. Displaced
arrays were used in the Lisp Machines (both the MIT variants and
I think also the PARC variants) to implement the window system. Windows
were displaced arrays into the hardware screen buffer. Alan Bawden just
copied this idea into Scheme. He was one of the MIT Lisp Machine developers.
2. I think a very important design goal of any array system for Scheme should
be to fully support not only the functionality of systems like (Py)Torch
and cuDNN but also the actual code base. (Py)Torch has an API for
multidimensional arrays that has become the defacto standard for deep
learning and GPUs. It is similar in many ways to the proposed SRFI. I
haven't had time to check in detail, but it would be good if it were 100%
compatible. So that in a Scheme implementation with a suitable FFI, one
could make bindings for the entire C backend to Torch (now replaced with
ATen) and all of cuDNN. Inter alia, this means support for GPU resident
arrays as well as CPU resident arrays. (You also need to be able to
support residency on different GPUs as well as migration between GPUs and
between GPUs and the CPU.) Also, Torch tensors [sic] aka arrays support
the notion of "stride" in addition to lower and upper bounds. This allows
downsampling/decimation through descriptors without copying. It also
allows reversal through negative strides. I haven't thought deeply enough
about whether the SRFI framework can support this.
3. One of the things about GPUs is that they support a variety of different
models of fold aka reduce, some of which are deterministic in their
parallelism and some of which are not. Because of the nonassociativity of
floating point addition this may make results nondeterministic. Sometimes
people tolerate this because of faster speed. Sometimes not. Frameworks
have ways of specifying whether or not you require deterministic results.
Similarly, some frameworks, like cuDNN, have a single API for things like
convolution, but take an argument that specifies an algorithm to use that
trades off time vs. (temporary) space.
4. The (Py)Torch tensor model distinguishes between contiguous and
noncontiguous tensors and has mechanisms for producing a contiguous tensor
from an noncontiguous one by copying. Some operations require contiguous
tensors or are more efficient on such. Some implicitly convert to
contiguous. There is an efficiency hack. Suppose you to a map + over two
tensors with compatible dimensions. Instead of doing it through the
descriptors you can do it directly on the underlying 1D storage and just
wrap the result in the appropriate descriptor. So that underlying map is a
simple low-lever 1D map instead of a map of higher oder through
descriptors.
Just a side comment. In my lab, we have a language and implementation called
checkpointVLAD which is a slight variant of a pure subset of Scheme. We made a
version of checkpointVLAD called Scorch (Scheme Torch, scorching hot) that has
tesnor types compatible with Torch and has a complete set of basis functions for
manipulating them in a way that is compatible with both Scheme and Torch,
along with a set of FFI bindings to Torch and cuDNN.
Jeff (http://engineering.purdue.edu/~qobi)