Am Fr., 26. März 2021 um 16:06 Uhr schrieb John Cowan <xxxxxx@ccil.org>:


On Fri, Mar 26, 2021 at 3:37 AM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

one cannot implement `include-ci` with `read`.

I think it's correct to implement it as a property of the port being read from, though R7RS does not say so.  This is what Larceny and Chibi do.

Yes, this looks like the "right" way to do it. R7RS would need some primitive to expose the (mutable) "fold-case" flag of a port. With it, `read` would suffice.
 
This leads to the follow-up question: When all of Scheme is allowed, we need to specify the environment through an import set. This should happen at the beginning of the stream so that the structure would be similar to top-level programs.
 
Such a restriction is untenable in the presence of #.(include "foo.scm"), because foo.scm may have its own #. imports.  The stream as a whole must allow #.(import (this) (that)).

I think the semantics of R7RS top-level programs (mutatis mutandis, of course) suffice. In a top-level program, you can do `include`, but you can't use `import` from included files. If this is needed, a library can be used together with `include-library-declarations`.

If we agree on the order of evaluation of the `#.<expr>` tokens is best to be left unspecified, import declarations for the reader only make sense at the beginning of the stream. In fact, a single one suffices. This determines the (global) environment for the expressions following "#.".
 
Alternatively, one can drop the multiple values but invent a "#xxxxxx@", which does list splicing.

I think it would be clearer and more Schemey to do this.

Okay! So in "#.<expr>", the expression must evaluate to a single value.
 
(define (read-delimited-list delim port)
  (let loop ((result '()))
    (cond
      ((whitespace? (peek-char port))
        (loop result))
      ((eqv? delim (peek-char port))
       (read-char port)    ; read past delim
       (reverse result)
      (else
        (loop (cons (read port) result)))))

Such a `read-delimited-list` that doesn't call recursively call `read` is unaware of other lexical syntax extensions. For example, it would break on an input like "[ #| ] |# ]".

It does call `read` recursively: see the last line.

That's true, but it doesn't help. It would still break on my example "[ #| ] |# ]" because there is no datum to read (just a comment) before the matching closed bracket comes.
 
PS We should also take a look at Racket's reader extensions, maybe to steal an idea or two. Their guide is very readable: https://docs.racket-lang.org/guide/languages.html.

It reads very much like Racket documentation: for insiders by insiders, and assuming a good deal of detailed knowledge I don't have.

Is it that bad?

From what I have understood, Racket has a two-step approach, one on the reader and one on the module level.

When a source file is loaded ("required") by Racket, the source forms are implicitly wrapped into a `#%module-begin` form:

(#%module-begin <form> ...)

By writing a library (again called "module" by Racket) that "provides" a non-standard `#%module-begin`, say `module-lang.rkt`, I can then write source files like:

#lang s-exp "module-lang.rkt"
<form> ...

Racket wraps the forms into the `#%module-begin` provided by `module-lang.rkt` and expands everything (which should finally yield an expansion of Racket's built-in `#%module-begin`). This way, arbitrary program transformations are possible (and arbitrary s-expression based languages).

If we also want to change the lexical syntax, we use

#lang reader "reader-lang.rkt"
<character> ...

In this case, "reader-lang.rkt" has to export a procedure named `read` (and `read-syntax`), which has to parse the character stream and has to return a module object, which is, roughly, the datum of a library exporting a `#%module-begin` together with the parsed body forms.

So the basic ideas are the two steps (roughly corresponding to syntax and semantics) and the customization through libraries that export well-known identifiers, namely `#%module-begin` and `read-syntax`.

What I haven't described above are the reader extensions, which are here: https://docs.racket-lang.org/guide/hash-reader.html.