Am Fr., 10. Dez. 2021 um 22:39 Uhr schrieb John Cowan <xxxxxx@ccil.org>:


On Fri, Dec 10, 2021 at 2:26 AM Marc Nieper-Wißkirchen <xxxxxx@gmail.com> wrote:

If a "string" means a piece of text, then excluding the NUL character is reasonable. On the other hand, if "string" means a sequence of (Unicode) characters,

It doesn't, in fact; it only means a sequence of Unicode characters not longer than a certain value.  The R6RS tries to defend against this difficulty in section 5.6:

As defined by this document, the Scheme programming
language is safe in the following sense: The execution of
a safe top-level program cannot go so badly wrong as to
crash or to continue to execute while behaving in ways
that are inconsistent with the semantics described in this
document, unless an exception is raised.
 
Violations of an implementation restriction must raise
an exception with condition type &implementationrestriction, as must all violations and errors that would
otherwise threaten system integrity in ways that might result in execution that is inconsistent with the semantics
described in this document. 

But this demands more of a conforming implementation than it can possibly supply.  Even apart from crude counterexamples like "if the computer catches on fire, it can't raise an exception even though the program crashed", in modern OSes a program normally is killed externally when global memory is scarce with no chance of recovery, and it may not even be the process most at fault that is killed.  So this passage is the most preposterous of all the preposterous MUSTard that appears in R6RS.  A pure mathematical model can ignore such Real World issues, but the specification of a machine (which is what a programming language is) cannot.

In short, at most we have an *approximation* to the abstract idea of a sequence of Unicode characters.  (Note that an R7RS implementation, unlike an R6RS one, may also support non-Unicode characters in strings, though I know of none that do so at present; Chicken does support 983,041 non-Unicode characters with codepoints from #x110000 to #x1FFFFF.)

I don't think that anything about the safety aspect is relevant for the topic of this thread but just distracts. We can talk about that but we should do it in a new thread.
 
R6RS can use its strings for data transmission in its interface for custom textual ports, R7RS cannot in general unless its textual files do not contain the NUL character by definition.

The final "by definition" is incorrect or at best highly misleading.  R7RS implementations can transmit NUL through a custom textual port.  It is true that you cannot count on it, but there is no difference in practice between an R7RS implementation that does not implement a feature because R7RS does not require it, and an R6RS implementation that does not implement a required feature because it derogates from the standard in this respect.  Indeed, the out-of-memory example above shows that a fully conforming R6RS implementation is impossible in practice, whereas (I believe) a fully conforming R7RS one is quite possible.

See my comment above.
 
On the other hand, when we do model a stream of arbitrary objects, an in-band sentinel value that is exposed to the user mustn't be used.

When do we in fact do that?  We build machines to serve our Real World purposes, not merely to try to model our abstractions and (as shown above) fail to do so.

To conclude, SRFI 121/158 are useful in situations where the objects to be processed are naturally disjoint from the eof-objects, but it is the wrong tool when arbitrary streams are to be processed.

I ask again: when is that?
Take a look at a random API for the various data types.  One has lists, vectors, hash tables, mapping, ... over the set of all Scheme values and not just over the set of all Scheme values but the eof-object.

I don't know what I could say more; to be honest, I am a bit baffled by your reply, which doesn't look very constructive to me.

Marc

PS I know very well that existing computers model Turing machines only approximately.  But that point is really irrelevant here, for otherwise, the specification of a programming language would just talk about the bits in RAM.