SRFI 4 lexical syntax John Cowan (06 Dec 2023 04:03 UTC)
Re: SRFI 4 lexical syntax Arthur A. Gleckler (06 Dec 2023 04:07 UTC)
Re: SRFI 4 lexical syntax John Cowan (06 Dec 2023 13:13 UTC)
Re: SRFI 4 lexical syntax Marc Feeley (07 Dec 2023 04:20 UTC)
Re: SRFI 4 lexical syntax John Cowan (07 Dec 2023 08:20 UTC)
Re: SRFI 4 lexical syntax Marc Nieper-Wißkirchen (07 Dec 2023 09:07 UTC)
Re: SRFI 4 lexical syntax Marc Feeley (07 Dec 2023 14:25 UTC)
Re: SRFI 4 lexical syntax Marc Nieper-Wißkirchen (07 Dec 2023 15:17 UTC)
Re: SRFI 4 lexical syntax Marc Feeley (07 Dec 2023 16:01 UTC)
Re: SRFI 4 lexical syntax Arthur A. Gleckler (07 Dec 2023 16:27 UTC)
Re: SRFI 4 lexical syntax Per Bothner (07 Dec 2023 17:20 UTC)

Re: SRFI 4 lexical syntax Marc Feeley 07 Dec 2023 14:25 UTC

> On Dec 7, 2023, at 4:07 AM, Marc Nieper-Wißkirchen <xxxxxx@gmail.com> wrote:
>
> Am Do., 7. Dez. 2023 um 09:21 Uhr schrieb John Cowan <xxxxxx@ccil.org>:
> Thanks for the examples.  Note for the record that nobody questions the need for lexical syntax for bytevectors/u8vectors, only for other kinds of homogeneous records.  However, your arguments apply equally to those cases.
> Lexical syntax for bytevectors is already present in R6RS and R7RS-small, so the question of its necessity is less relevant.  (For sure, Scheme could work well without it.)
>
> That said, there is another difference between bytevectors and other homogenous vectors; literal bytevectors allow us to embed general binary data in literals.  As long as other homogeneous vectors are realized as views into bytevectors (as in the R6RS bytevector library), they can make use of the bytevector lexical format.  This does not work in the other direction, making bytevectors special.

In SRFI-4, s16vectors or any homogoneous vector is not a view onto a u8vector. Each homogeneous vector is its own type with no predefined mapping to a u8vector.

A mapping from homogeneous types to u8vectors would expose the underlying representation of those types. For example, are elements of the vector in big or little endian layout? Is IEEE754 representation used for floating point values or some other representation? Is a u16vector element stored using 2 bytes or a machine word (some old architectures are not byte addressable).

This would cause portability and interoperability issues when a homogeneous vector created in one environment is used in another environment. For example a Scheme program creating a f64vector of values which is embedded in a data file or other Scheme program, and then this is read in a different environment (a different implementation of Scheme and/or the same implementation of Scheme on a different operating system or machine, etc). As a concrete example, with Gambit v4.9.5 on an Apple M2 cpu:

> (##subtype-set! (f64vector 1.0 2.0) (##subtype (u8vector)))
#u8(0 0 0 0 0 0 240 63 0 0 0 0 0 0 0 64)

and on a POWER7 cpu (which is configured as a big-endian PPC processor):

> (##subtype-set! (f64vector 1.0 2.0) (##subtype (u8vector)))
#u8(63 240 0 0 0 0 0 0 64 0 0 0 0 0 0 0)

So the underlying representation of the f64vector #f64(1.0 2.0) is different on these architectures. The point of an external representation is to abstract the underlying representation of the data.

>  On Wed, Dec 6, 2023 at 11:20 PM Marc Feeley <xxxxxx@iro.umontreal.ca> wrote:
>
> > On Dec 6, 2023, at 8:13 AM, John Cowan <xxxxxx@ccil.org> wrote:
> >
> >
> >
> > On Tue, Dec 5, 2023 at 11:07 PM Arthur A. Gleckler <xxxxxx@speechcode.com> wrote:
> >  I don't understand.  Isn't it to make it possible to put literals representing these values into one's program?  Are you looking for a purpose beyond that?
> >
> > The alternative view is that we should not have such literals, but simply use macros of the form (s32 1 2 15 3453) that work at expand time rather than read time. See <https://codeberg.org/scheme/r7rs/issues/109> for the most recent discussion.
> >  By the way, I just checked, and Marc Feeley, the author, is still subscribed to this mailing list.
> >
> > I'd be surprised if he weren't.
>
> I find it surprising that people are questioning the need for a lexical syntax for homogeneous vectors and I’m puzzled at some of the arguments given in issue 109.
>
> For a macro like (u8 1 2 3) to be a substitute for '#u8(1 2 3) it has to appear in an evaluated position. So it can work here:
>
> (define foo '#u8(1 2 3))   ;; equivalent to the proposed (define foo (u8 1 2 3))
>
> But it can’t be used in a nested literal such as
>
> (define foo '#(#u8(1 2 3) #u8(4 5 6) #u8(7 8 9)))   ;; no equivalent with “u8” macro
>
> which is a perfectly fine representation for a literal 3x3 matrix of bytes.
>
> While the simple "u8" macro from above wouldn't work, a more general macro producing a literal datum will work.  This is actually what is proposed in #109.
>
> Such a macro would interpret a mini-DSL describing literals.  (Personally, I would find a specialized macro "matrix-literal" better, but others would likely disagree.)
>
> And what about literals that embed u8vectors like:
>
> (define smileys-utf8-alist
>   '((#\😁 #u8(240 159 152 129))
>     (#\😳 #u8(240 159 152 179))
>     (#\😱 #u8(240 159 152 177))))
>
> See above.
>  Moreover, if there is no external representation for u8vectors, it would not be possible to pretty-print the following code after macro-expansion:
>
> (lambda () (u8 1 2 3))
>
> That would be a real bummer for debugging and s-expression manipulation in general.
>
> I don't understand this; many Scheme objects have no official written representation (e.g. records), yet implementations print something useful.

And any time this happens it creates a hurdle for the programmer because she can’t type in at the REPL the value that has been printed. She can’t write a file with these values to later read them back in another instance of the program. There is a tradition in Lisp to make as many types writable and readable (i.e. write/read invariance). I believe in that point of view because it simplifies working with the language.

I can understand the reasons why records and procedures don’t have write/read invariance. Records could have write/read invariance but the external representation would be rather verbose and hard to read, so not appropriate for typical debugging sessions. There could however be a parameter object or variant of the “write” procedure that offers write/read invariance of records. In Gambit this is done by changing the readtable attached to the port:

(define (serialize-set rt)
  (readtable-sharing-allowed?-set rt 'serialize))

(define (serialize obj)
  (call-with-output-string
   (lambda (p)
     (output-port-readtable-set! p (serialize-set (output-port-readtable p)))
     (write obj p))))

(define (deserialize str)
  (call-with-input-string
   str
   (lambda (p)
     (input-port-readtable-set! p (serialize-set (input-port-readtable p)))
     (read p))))

(define-type point
  id: BE5075CF-ADA6-4E03-8D4E-BB37D1FDBF4E
  x
  y)

(define a (make-point 11 22))

(define s (serialize a))
(define b (deserialize s))

(pp (equal? a b)) ;; #t

(pp s) ;; "#structure(#structure(#0=#structure(#0# ##type-5 type 8 #f #(id 1 #f name 5 #f flags 5 #f super 5 #f fields 5 #f)) ##type-2-BE5075CF-ADA6-4E03-8D4E-BB37D1FDBF4E point 24 #f #(x 0 #f y 0 #f)) 11 22)"

There is no such complexity with an external representation for homogeneous vectors.

>  Finally, let’s not forget that SRFI-4 is now 25 years old! It is supported by most Scheme systems. This seems like the ideal situation for standardizing a feature! Lots of experience with the feature and broad support among Scheme systems. How much experience does anybody have with (u8 1 2 3)?
>
> We are not really talking about (u8 ...) because we have #u8, do we?  We are talking about, say (s16  ...) or a general macro to produce literals.  The point is that the latter can be fully implemented as a Scheme library.  Adding #s16, ... etc. means, on the other hand, that we add something to the big ball of mud that Scheme's lexical syntax already is.  Whether 25 years old or not, it would be another feature piled on top, which would pervade all of Scheme (because lexical syntax is global and shared).
>
> Marc

Let’s not abuse the “another feature piled on top” argument. An extension to the lexical syntax that does not interfere with anything else is a relatively simple concept to grasp. Moreover it is a natural generalization of the “#u8(…)” lexical syntax that is currently in the R7RS standard. It is the absence of a “#f64(…)”, etc lexical syntax that is hard to understand and constitutes a wart in the language.

Marc