Re: #\a octothorpe syntax vs SRFI 10

Re: #\a octothorpe syntax vs SRFI 10 campbell@xxxxxx 27 Dec 2004 07:14 UTC
On Sun, 26 Dec 2004, Aubrey Jaffer wrote:

> Arrays are a fundamental data organizing paradigm from the origins of
> computing; FORTRAN has arrays; APL has arrays.  I hope arrays will
> become part of Scheme in R6RS.  For a construct which generalizes two
> of Scheme's three aggregate data types, a succinct read-syntax does
> not seem overly burdensome.

Need it be so succinct as to add eleven new octothorpe reader macros,
each dispatching further for the large number of different types of
arrays?  It would be much simpler, I think, and it would not lose much
brevity, to use SRFI 10; indeed, SRFI 10 was designed in response to
this issue as it arose in SRFI 4.

>  | In particular, I suggest that it be:
>  |
>  |   #,(ARRAY [<rank>] <type> <elements> ...)
>
> Rank cannot be deduced from <element> nesting for heterogeneous
> arrays.  I suggest that <rank> be required.

Sorry, I was not sufficiently clear there.  I meant to specify that the
rank defaults to 1, like #Axxx(...) in the current proposal.

>  | So, for example, the two-by-two array of unsigned 16-bit integers from
>  | the document might be written as #,(ARRAY 2 u16 (0 1) (2 3)).
>  | General object arrays' types would be OBJECT (so #(FOO 1 #T ())
>  | could also be written #,(ARRAY OBJECT FOO 1 #T ())) and character
>  | arrays' types would be CHAR (so "foo" could alternatively be
>  | written #,(ARRAY CHAR #\f #\o #\o)).
>
> This appears to introduce type symbols like U16 and CHAR which are not
> part of srfi-47.  The prototype functions in srfi-47 return arrays.
>
>  | [...]
>
> I am not opposed to also having SRFI-10 syntax for arrays.  This would
> seem to require reserving a set of symbols for type specification,
> which is an unschemely way of doing things.  Scheme goes to some
> lengths to avoid using symbols as cookies; witness NULL? and
> EOF-OBJECT?

Perhaps I'm confused, but I don't see much difference between my usage
of symbols -- which exist only at read-time, never at run-time, unlike
nil and the EOF object -- and your usage of the suffixes of the new #A
syntax.  Could you elaborate on how my proposal is any worse in that
respect than yours?

>  | (I'd also prefer that the names be longer & much more descriptive, like
>  | UNSIGNED16 or BOOLEAN, but I suppose that's a little too late, now that
>  | SRFI 47 has already been finalized & the incomprehensible abbreviations
>  | of array types have been set into stone...)
>
> SRFI-47 defines procedures to return prototype arrays.  Additional
> procedures can be added to alias the abbreviated ones.

This works for SRFI 47, but not necessarily this SRFI: one cannot
define one's own aliases for existing array types in the reader syntax.

>                                                         But explicitly
> complete descriptions for numeric types are rather long:
>
> [...long list...]
>
> These long names present more of a burden for the memories of
> non-English-speakers than the short names, which are the same for
> everyone.

I'm not suggesting names so long that they induce tedium in typists,
but rather names somewhat longer than are excessively obscure, such as
INTEGER-U16, COMPLEX-64, BIT, et cetera.  Furthermore, the single-
character mnemonics are derived from English, and there is certainly
the possibility that their names would begin with different initial
letters in other languages; however, everything in Scheme is from
English anyway, so I see nothing wrong with using English words for
array element type names.

>            There is Scheme precedent for abbreviated names in
> identifiers like CADR an CDADAR and in the radix and exactness
> prefixes #B, #O, #D, #X, #E, #I.

For very fundamental primitives such as CAR & CDR that are frequently
used, and where the ability to stack them is convenient (in the case of
CAR & CDR, not, for example, HEAD & TAIL or FIRST & REST), this is
quite reasonable; however, arrays are much less fundamental to Scheme,
and, even if one wishes to debate that, literal arrays are much less
frequently written than CAR & CDR.  A better analogue would be
ARRAY-REF, but I haven't seen any objections to that as opposed to
AREF, and I much prefer ARRAY-REF rather than AREF.

Regarding prefixes for radices & exactness: I still dislike them, but
numbers are so concisely expressed anyway that they would bloat their
significance in a literal number to expand the prefixes for radices &
exactness.  On the other hand, literal arrays' contents will usually be
much larger than just the initial characters denoting the element type,
so the length of the prefix is no more unnecessarily significant if
increased slightly.

Let me also point out here that much of Scheme's naming conventions and
lexemes originated from T.  In T, there was no built-in facility for
multi-dimensional arrays, but there were still object representation
names used by Orbit's representation analyzer and for the C & Pascal
FFIs.  These were named semi-verbosely, as I suggest above; e.g., the
representation descriptor of unsigned, sixteen-bit integers was named
REP/INTEGER-16-U.  Many of the names in T were intended to be long
enough to be understandable and not obscure, but not so long as to be
excessive; this has tended to hold in Scheme as well.  I think it would
be good to preserve that in the array element type names as well.

>  | Also, one more  comment on the draft: it doesn't actually say, as far
>  | as I can tell, anything about the actual syntax of arrays.  It just
>  | gives an example & a reader.  This is a rather glaring omission.
>
> Thanks for pointing this out.  I have replaced the example with:

Thanks.  That is much better.

> [...]
>
>  A two-by-three array of unsigned 16-bit integers is written:
>
>  #2au16((0 1 2) (3 5 4))
>
>  This array could have been created by (make-array (Au16) 2 3).

Insignificant point: I think it would probably be a bit better to
follow that call to MAKE-ARRAY with code to initialize the new array.