Re: #\a octothorpe syntax vs SRFI 10

Re: #\a octothorpe syntax vs SRFI 10 campbell@xxxxxx 31 Dec 2004 05:19 UTC
Before I get to responding to the content of this message, let me first
state my position on square brackets for arrays.

I think it would be much better to leave this SRFI as it is with regard
to parentheses versus square brackets and later have a SRFI, or perhaps
have it in R6RS, to make square brackets equivalent to parentheses.
There is, first of all, a great deal of existing code written with that
extension: it is a very wide-spread one.  It would also be possible to
write arrays using square brackets, as suggested by Aubrey Jaffer, in
order to make the array dimensions clearer, if so desired, and, though
it would require explicitly writing the rank, I don't think that that
is a particularly crucial matter.

I'm opposed to any specialized square bracket syntax for arrays, and
very strongly so to such a complicated syntax as bear proposes.  Scheme
is not APL; it is intended to have a very simple & consistent syntax
with as few extensions as reasonable, and SRFI 10 is there precisely to
obviate the need to add many new such inconsistent extensions.  It's
not there just to look nice: it should be put to use.  Scheme is also
not an array-oriented language, and I sincerely doubt that the extra
few characters added to the literal array syntax make enough of a
difference to warrant condensing them into square brackets, which would
break a large quantity of existing code.

On Thu, 30 Dec 2004, Aubrey Jaffer wrote:

> In the updated srfi-58.html I sent to the editor I have eliminated the
> #Axxx syntax.  The rank digit(s) will be required.

OK.  I have no objections to that.

>  | Perhaps I'm confused, but I don't see much difference between my usage
>  | of symbols -- which exist only at read-time, never at run-time, unlike
>  | nil and the EOF object -- and your usage of the suffixes of the new #A
>  | syntax.  Could you elaborate on how my proposal is any worse in that
>  | respect than yours?
>
> To keep symbols-as-cookies out of Scheme to this point probably means
> that some RRRS-author(s) is severly allergic to it.
>
> I want arrays in R6RS.  I don't want to jeopardize array's chances by
> making a proposal which looks like symbols-as-cookies, even if it is
> not exactly true in a technical sense.

I believe that there is aversion only to using symbols at run-time in
specialized cases.  For example, it would be stupid to make the EOF
object be a symbol: READ could return it when not at the end of a port.
There is already use of 'symbols-as-cookies'; consider, for example,
the COND & CASE macros, which use ELSE, or in general the list of
literal identifiers in SYNTAX-RULES.

> SRFI-10 mandates parentheses (eg. #,(infinity) instead of #,infinity).
> This makes its SRFI-10 objects look like expressions to be evaluated.
> SRFI-58 objects will be used as prototype array objects in calls to
> MAKE-ARRAY:
>
> (make-array '#1Ar64(1.0) 2 3)                   ; Current SRFI-58 syntax
>
> (make-array '#,(Array 1 ar64 [1.0]) 2 3)        ; SRFI-10 style
>
> (make-array '#,(Ar64 [1.0]) 2 3)                ; compact-SRFI-10 style.
>                                                 ; [] nesting gives rank.
>
> (make-array    (Ar64 1.0) 2 3)                  ; Current SRFI-47 functions
>
>     ==> #2Ar64((1.0 1.0 1.0) (1.0 1.0 1.0))
>
> The SRFI-10 style above looks like symbols-as-cookies.  The
> compact-SRFI-10 style does not.

I still don't see why that is the case.  How is AR64 any less a 'symbol
as a cookie' when it comes as the car of a list or before the array's
parentheses than when it comes as the cadr of a list?

>                                  Do you like the compact-SRFI-10
> style; or would it take too much of SRFI-10s namespace?

I think it is better to clearly mark 'array': a single 'A' is not
sufficient.  I also think it is pointless to combine ARRAY with the
element type specifier: this introduces no benefits I can see, but it
complicates both human and machine parsing, because the token has to be
parsed all at once and then split up, or one must remember a number of
different tokens that have a very similar meaning.  (E.g., I could just
define a single SRFI 10 constructor for ARRAY that could just construct
an array from the given type specifier.  If ARRAY were conjoined with
the type specifier, there would have to be a number of different SRFI
10 constructors that could not be easily or elegantly unified.)  On the
other hand, a single space could split the two tokens up and thereby
make it clearer to both machine and human.

> Having the read prefix use the same coding as the prototype functions
> halves the (human) memory load.  If we move to nomenclature like
> REAL-64, then I want prototype functions to be available with those
> names:
>
> (make-array '#,(Array 1 real-64 [0.0]) 2 3)     ; longer SRFI-10 Style
>
> (make-array '#,(real-64 [0.0]) 2 3)             ; longer compact-SRFI-10
>
> (make-array    (real-64 0.0) 2 3)               ; analogous SRFI-47 function

The 'longer SRFI-10-style' one is the only one I don't object to.  It
includes both ARRAY and a more clearly written 'REAL-64.'  (REAL-64
0.0) looks like it's coercing 0.0 to be a 64-bit real number; likewise
with #,(REAL-64 [0.0]), only with a set of extraneous brackets.

>  | This works for SRFI 47, but not necessarily this SRFI: one cannot
>  | define one's own aliases for existing array types in the reader
>  | syntax.
>
> Yes.  That is why we are dicussing this now; before SRFI-58 is
> finalized.

OK, although I'm allergic to inconsistency like this between SRFI 47's
procedures & SRFI 58's array element type specifiers.  If there is to
be any inconsistency, I'd rather that the procedure names be longer
while the type specifiers for the literal array syntax be shorter, but
that's rather moot at this point.

>  | I'm not suggesting names so long that they induce tedium in typists,
>  | but rather names somewhat longer than are excessively obscure, such as
>  | INTEGER-U16, COMPLEX-64, BIT, et cetera.
>
> This is requiring users to internalize assumptions that integers are
> exact; and reals and complexes are not.

This is no different from the current naming scheme, however, where you
have to remember not only that integers are always exact, and complexes
& reals may not be, but also that 'i' stands for 'integer,' 'r' stands
for 'real,' &c.

(Complexes, as far as I know, can be either exact or inexact, by the
way.  I'm not sure about reals, but I think they theoretically can be,
too.  (This is, of course, ignoring the number tower.)  If there were
to be both exact & inexact complex arrays, and the same for reals &
integers, I'd not mind adding the token.)

> English doesn't much help remember Scheme exponent markers:
>
>   The letters `s', `f', `d', and `l' specify the use of SHORT, SINGLE,
>   DOUBLE, and LONG precision, respectively.
>
> I don't usually think of a DOUBLE as shorter than a LONG.  And where
> did `f' for SINGLE come from?  Maybe it is a C-ism.  In any case, it
> is one of five characters (with 'e') rather than one of five longer
> sequences to remember.

Although I don't have a better suggestion for the number syntax, I
can't claim to like it as it is, either, and I think there are many who
share my sentiments.  (I have certainly spoken to a number of them, and
I recall a great deal of discussion over confusion in the number syntax
on RRRS-authors in the past.  Bear seems to agree with me here.)

>  | Let me also point out here that much of Scheme's naming conventions
>  | and lexemes originated from T.  In T, there was no built-in
>  | facility for multi-dimensional arrays, but there were still object
>  | representation names used by Orbit's representation analyzer and
>  | for the C & Pascal FFIs.  These were named semi-verbosely, as I
>  | suggest above; e.g., the representation descriptor of unsigned,
>  | sixteen-bit integers was named REP/INTEGER-16-U.  Many of the names
>  | in T were intended to be long enough to be understandable and not
>  | obscure, but not so long as to be excessive; this has tended to
>  | hold in Scheme as well.  I think it would be good to preserve that
>  | in the array element type names as well.
>
> I found my T2.7 manual, but it doesn't have FFIs in it.

Sorry, I meant T 3.1.  There are several copies of the manual in a few
different formats at

  http://www.bloodandcoffee.net/campbell/t/

A document describing just the FFI is also available here:

  http://www.bloodandcoffee.net/campbell/code/t-ffi.tex

On a note unrelated to SRFI 58, do you still have any other T 2.7
resources available that you could send to me?  I'd be interested in
anything: the manual, the source, libraries, &c.

> If I come up with longer names and they aren't better than the current
> system (used by SCM for many years), then I would be making a
> straw-man.  Please replace the first column of this table with a set
> of better names, so we can discuss this change in more concrete terms.

Here are my suggestions:

    prototype
    procedure     exactness  element-type
    =========     =========  ============
    object                   any (conventional vector)
    complex-64    inexact    64-bit+64-bit complex
    complex-32    inexact    32-bit+32-bit complex
    real-64       inexact    64-bit real
    real-32       inexact    32-bit real
      Note, by the way, the messages on the SRFI 4 list regarding reals
      versus floats, so it might be FLOAT-64 & FLOAT-32 instead.  I
      have nothing substantial to say on the matter, however.
    integer-64-s  exact      64-bit signed integer
      or integer-s64 or something of that variety
      (likewise for the other integer types)
    integer-32-s  exact      32-bit signed integer
    integer-16-s  exact      16-bit signed integer
    integer-8-s   exact      8-bit signed integer
    integer-64-u  exact      64-bit unsigned integer
    integer-32-u  exact      32-bit unsigned integer
    integer-16-u  exact      16-bit unsigned integer
    integer-8-u   exact      8-bit unsigned integer
    char                     char (string)
    bit or boolean           boolean (bit-vector)

I think the names would be OK without the hyphens, too, as long as, for
the integer specifiers, the U/S is kept on the right of the number; for
example, INTEGER16U.