Re: #\a octothorpe syntax vs SRFI 10

Re: #\a octothorpe syntax vs SRFI 10 Aubrey Jaffer 30 Dec 2004 22:23 UTC
 | Date: Sun, 26 Dec 2004 23:14:00 -0800 (PST)
 | From: xxxxxx@autodrip.bloodandcoffee.net
 |
 | On Sun, 26 Dec 2004, Aubrey Jaffer wrote:
 |
 | > Arrays are a fundamental data organizing paradigm from the origins of
 | > computing; FORTRAN has arrays; APL has arrays.  I hope arrays will
 | > become part of Scheme in R6RS.  For a construct which generalizes two
 | > of Scheme's three aggregate data types, a succinct read-syntax does
 | > not seem overly burdensome.
 |
 | Need it be so succinct as to add eleven new octothorpe reader macros,
 | each dispatching further for the large number of different types of
 | arrays?  It would be much simpler, I think, and it would not lose much
 | brevity, to use SRFI 10; indeed, SRFI 10 was designed in response to
 | this issue as it arose in SRFI 4.
 |
 | >  | In particular, I suggest that it be:
 | >  |
 | >  |   #,(ARRAY [<rank>] <type> <elements> ...)
 | >
 | > Rank cannot be deduced from <element> nesting for heterogeneous
 | > arrays.  I suggest that <rank> be required.
 |
 | Sorry, I was not sufficiently clear there.  I meant to specify that the
 | rank defaults to 1, like #Axxx(...) in the current proposal.

In the updated srfi-58.html I sent to the editor I have eliminated the
#Axxx syntax.  The rank digit(s) will be required.

 | >  | So, for example, the two-by-two array of unsigned 16-bit integers from
 | >  | the document might be written as #,(ARRAY 2 u16 (0 1) (2 3)).
 | >  | General object arrays' types would be OBJECT (so #(FOO 1 #T ())
 | >  | could also be written #,(ARRAY OBJECT FOO 1 #T ())) and character
 | >  | arrays' types would be CHAR (so "foo" could alternatively be
 | >  | written #,(ARRAY CHAR #\f #\o #\o)).
 | >
 | > This appears to introduce type symbols like U16 and CHAR which are not
 | > part of srfi-47.  The prototype functions in srfi-47 return arrays.
 | >
 | >  | [...]
 | >
 | > I am not opposed to also having SRFI-10 syntax for arrays.  This would
 | > seem to require reserving a set of symbols for type specification,
 | > which is an unschemely way of doing things.  Scheme goes to some
 | > lengths to avoid using symbols as cookies; witness NULL? and
 | > EOF-OBJECT?
 |
 | Perhaps I'm confused, but I don't see much difference between my usage
 | of symbols -- which exist only at read-time, never at run-time, unlike
 | nil and the EOF object -- and your usage of the suffixes of the new #A
 | syntax.  Could you elaborate on how my proposal is any worse in that
 | respect than yours?

To keep symbols-as-cookies out of Scheme to this point probably means
that some RRRS-author(s) is severly allergic to it.

I want arrays in R6RS.  I don't want to jeopardize array's chances by
making a proposal which looks like symbols-as-cookies, even if it is
not exactly true in a technical sense.

SRFI-10 mandates parentheses (eg. #,(infinity) instead of #,infinity).
This makes its SRFI-10 objects look like expressions to be evaluated.
SRFI-58 objects will be used as prototype array objects in calls to
MAKE-ARRAY:

(make-array '#1Ar64(1.0) 2 3)                   ; Current SRFI-58 syntax

(make-array '#,(Array 1 ar64 [1.0]) 2 3)        ; SRFI-10 style

(make-array '#,(Ar64 [1.0]) 2 3)                ; compact-SRFI-10 style.
                                                ; [] nesting gives rank.

(make-array    (Ar64 1.0) 2 3)                  ; Current SRFI-47 functions

    ==> #2Ar64((1.0 1.0 1.0) (1.0 1.0 1.0))

The SRFI-10 style above looks like symbols-as-cookies.  The
compact-SRFI-10 style does not.  Do you like the compact-SRFI-10
style; or would it take too much of SRFI-10s namespace?

Having the read prefix use the same coding as the prototype functions
halves the (human) memory load.  If we move to nomenclature like
REAL-64, then I want prototype functions to be available with those
names:

(make-array '#,(Array 1 real-64 [0.0]) 2 3)     ; longer SRFI-10 Style

(make-array '#,(real-64 [0.0]) 2 3)             ; longer compact-SRFI-10

(make-array    (real-64 0.0) 2 3)               ; analogous SRFI-47 function

 | >  | (I'd also prefer that the names be longer & much more descriptive, like
 | >  | UNSIGNED16 or BOOLEAN, but I suppose that's a little too late, now that
 | >  | SRFI 47 has already been finalized & the incomprehensible abbreviations
 | >  | of array types have been set into stone...)
 | >
 | > SRFI-47 defines procedures to return prototype arrays.  Additional
 | > procedures can be added to alias the abbreviated ones.
 |
 | This works for SRFI 47, but not necessarily this SRFI: one cannot
 | define one's own aliases for existing array types in the reader
 | syntax.

Yes.  That is why we are dicussing this now; before SRFI-58 is
finalized.

 | > But explicitly complete descriptions for numeric types are rather
 | > long:
 | >
 | > [...long list...]
 | >
 | > These long names present more of a burden for the memories of
 | > non-English-speakers than the short names, which are the same for
 | > everyone.
 |
 | I'm not suggesting names so long that they induce tedium in typists,
 | but rather names somewhat longer than are excessively obscure, such as
 | INTEGER-U16, COMPLEX-64, BIT, et cetera.

This is requiring users to internalize assumptions that integers are
exact; and reals and complexes are not.  Scheme has a strong
propensity for calling things exactly what they are, witness
CALL-WITH-CURRENT-CONTINUATION, EOF-OBJECT?, LIST?, and PAIR?.

 | Furthermore, the single-character mnemonics are derived from
 | English, and there is certainly the possibility that their names
 | would begin with different initial letters in other languages;
 | however, everything in Scheme is from English anyway, so I see
 | nothing wrong with using English words for array element type
 | names.

English doesn't much help remember Scheme exponent markers:

  The letters `s', `f', `d', and `l' specify the use of SHORT, SINGLE,
  DOUBLE, and LONG precision, respectively.

I don't usually think of a DOUBLE as shorter than a LONG.  And where
did `f' for SINGLE come from?  Maybe it is a C-ism.  In any case, it
is one of five characters (with 'e') rather than one of five longer
sequences to remember.

 | > There is Scheme precedent for abbreviated names in identifiers
 | > like CADR an CDADAR and in the radix and exactness prefixes #B,
 | > #O, #D, #X, #E, #I.
 |
 | ... A better analogue would be ARRAY-REF, but I haven't seen any
 | objections to that as opposed to AREF, and I much prefer ARRAY-REF
 | rather than AREF.

I am not opposed to longer names, but they must work together and they
must integrate well with Scheme.

 | Let me also point out here that much of Scheme's naming conventions
 | and lexemes originated from T.  In T, there was no built-in
 | facility for multi-dimensional arrays, but there were still object
 | representation names used by Orbit's representation analyzer and
 | for the C & Pascal FFIs.  These were named semi-verbosely, as I
 | suggest above; e.g., the representation descriptor of unsigned,
 | sixteen-bit integers was named REP/INTEGER-16-U.  Many of the names
 | in T were intended to be long enough to be understandable and not
 | obscure, but not so long as to be excessive; this has tended to
 | hold in Scheme as well.  I think it would be good to preserve that
 | in the array element type names as well.

I found my T2.7 manual, but it doesn't have FFIs in it.

If I come up with longer names and they aren't better than the current
system (used by SCM for many years), then I would be making a
straw-man.  Please replace the first column of this table with a set
of better names, so we can discuss this change in more concrete terms.

    prototype
    procedure exactness  element-type
    ========= =========  ============
    vector               any (conventional vector)
    ac64      inexact    64-bit+64-bit complex
    ac32      inexact    32-bit+32-bit complex
    ar64      inexact    64-bit real
    ar32      inexact    32-bit real
    as64      exact      64-bit signed integer
    as32      exact      32-bit signed integer
    as16      exact      16-bit signed integer
    as8       exact      8-bit signed integer
    au64      exact      64-bit unsigned integer
    au32      exact      32-bit unsigned integer
    au16      exact      16-bit unsigned integer
    au8       exact      8-bit unsigned integer
    string               char (string)
    at1                  boolean (bit-vector)