On Tue, Mar 23, 2021 at 5:24 AM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

I don't see why the use of a small interpreter is important. Every Scheme system already contains an interpreter for the full Scheme language (to implement `eval`, for example), so that' no reasoning for coming up with a new (smaller) language.

Most Schemes don't provide a way to switch out the global environment and install a new one (see below). We do need to share run-time data structures with `read`, which is what we want.

Which reminds me that CL has a dynamic variable *read-eval* that controls whether #. works or not; if T it works, if NIL a reader error is signaled. The default in CL is T, but I think for us it should be #f so that using `read` to read a datafile of S-expressions cannot break your program unless you explicitly enable it. (We've learned a bit about malware since the 1980s.)

As I wrote earlier, the situation is comparable to procedural macros. A lot of their power comes from the fact that they allow the full Scheme language (executed at a different phase than the actual program). The code used in procedural macros usually doesn't affect (and shouldn't affect) any global state.

With great power comes great responsibility. It may be that the self-restraint solution is sufficient even for public code, given the read-eval parameter.

So "#.(values 1 2 3)" could be read as three tokens, "1", "2", and "3"?

No. In the pre-ANSI standard, the hash-dot form had to return 0 values or 1 value only, since there is no way to specify "multiple values" on input except as multiple arguments to `values`.

That "#.(values)" evaluates to "()" doesn't make sense logically and shouldn't be adopted.\

CL has a coercion saying that where the continuation expects exactly one value (as in a function argument), multiple values are coerced to the first value only, and 0 values are coerced to (). The definition of #. says "#.foo is read as THE object resulting from the evaluation of the object represented by foo" (emphasis added). The implicature is that #. expects only one value, and if it gets more or less, the above coercion is applied. "The life of [CL] has not been logic: it has been experience." --not quite Oliver Wendell Holmes Jr.

Simple character-based parsers aren't composable, though. Assume, there is not only the "[...]" type but also a new kind of string "{...}" defined somewhere else.

Now, how to handle "[1 2 3 {]}]"?

If you look at my description above, there is no issue, because the #\[ handler invokes `read` recursively, so it is responsible for processing the "{]}" element of the list using the #\{ handler (which is indeed a character parser). The #\[ handler only uses peek-char when looking for the terminating character. It is quite common to abstract this into a procedure that CL calls `read-delimited-list`; it takes the terminating character as an argument.

More useful CL reader functions and parameters can be found at <http://clhs.lisp.se/Body/c_reader.htm>

How to handle "[1 2 #;(very complicated datum) 3]"?

Also by the recursive use of `read`.

How to handle "#0=[1 2 #0#]"?

The external reader will handle the "#0=" part and the internal ("recursive", in CL terms) reader will handle "#0#". They need to share state. CL `read` normally creates a new state object when a new top-level sexp is being parsed, and accepts an optional argument to tell a recursive invocation to share state with its caller.

If the reader extensions are based on character-by-character parsers (which I don't think will work satisfactorily as I tried to illustrate with my example above),

They have access to both char-by-char and sexp-by-sexp parsing.

I don't think it is a good idea to assume any special order,

I don't understand this at all. Take the case of parsing ordinary lists (which in CL is typically accomplished with the use of `read-delimited-list`). If you want to start several parsers running in parallel for the sexps of the list, how do you know where to start them, even if the whole file is in memory as a string?

This is no problem if HashDot Scheme is normal Scheme. The "xvector" type is just defined as most other types in a Scheme library, which is then imported by the reader code.

True. But I'm concerned about the compiler's own environment being upset. You'll note that an import is not an expression, nor is it acceptable to `eval`, leaving dynamic importing to a possible future SRFI. This means that procedural macros cannot change the global import state. If we allow read macros to do so, and the compiler has (unknown to the programmer) imported certain libraries, what happens when loading the source code imports conflicting libraries? I think this will require support for multiple global environments.