Email list hosting service & mailing list manager

peek-char problem Shiro Kawai (18 Jun 2020 03:41 UTC)
Re: peek-char problem Marc Nieper-Wißkirchen (18 Jun 2020 05:52 UTC)
Re: peek-char problem Göran Weinholt (18 Jun 2020 09:03 UTC)
Re: peek-char problem Shiro Kawai (18 Jun 2020 10:13 UTC)
Re: peek-char problem Göran Weinholt (18 Jun 2020 12:25 UTC)
Re: peek-char problem Shiro Kawai (18 Jun 2020 19:09 UTC)
Re: peek-char problem Per Bothner (18 Jun 2020 16:29 UTC)
Re: peek-char problem Shiro Kawai (18 Jun 2020 18:53 UTC)

Re: peek-char problem Göran Weinholt 18 Jun 2020 12:18 UTC

Shiro Kawai <xxxxxx@gmail.com> writes:

> A few issues are mixed in.
>
> - CRLF problem: If we can assume R6RS semantics, we can leave its
> handling to transcoders and we may say textual input port only returns
> #\newline. However, R7RS explicitly says read-line must recognize CR,
> LR, and CRLF.

Ah, I did not realize that R7RS read-line is different from R6RS
get-line in this respect. The read-line procedure is new in R7RS (it was
not in R5RS) and R7RS-small does not talk about transcoders.

If read-line does its own newline parsing then R7RS ports have
(eol-style none) as their R6RS-equivalent eol-style. The read-line
procedure needs to use peek-char to look at the following character so
it can translate CR NL to #\newline. The read procedure generally does
not need to treat newlines as anything other than whitespace, IIRC,
except in multiline strings. Does R7RS say anything about multiline
strings and the effect of DOS or Unix newlines inside them? If not then
no special newline parsing should be needed in read.

> - "read-line returns immediately after reading CR and remember it so
> that next read-char ignores LF" doesn't work. port-position may be
> called right after read-line, and that position must point after the
> following LF. The current assumption is that port-position just
> returns whatever get-position returns, so we can't attach the
> information that that position is right after CR. (If we allow the
> custom port to "wrap" the position info returned by get-position, then
> we can attach such info, so that set-port-position! can restore the
> state.)

Port positions on transcoded ports are not so simple, but I'm not sure
the custom port actually needs to wrap the position. I will again speak
from the R6RS perspective. The case you describe is applicable when a
transcoder is attached to an underlying binary input(/output) port. The
port-position procedure is not applied directly to the underlying port,
it is applied to the transcoded port.

The transcoded port will have its own port-position procedure that can
ask the underlying port for its position and then wrap it with its own
state. It will need to keep track of transcoding state (e.g. previous
character) but it also needs to account for any data that has been
consumed from its own internal translation buffer, because the
underlying port is positioned to immediately after the last read into
the translation buffer and set-port-position! needs to be able to get
the underlying port into the position it was before the last buffer
fill.

Again, this is with R6RS. Similar concepts might need to exist in an
R7RS-small implementation, but not necessarily be exposed to the user.

> - read also needs to look ahead. Suppose you read from "abc()". It
> must return a symbol abc, stopping when #\( is read. That open paren
> must be read by the subsequent read or read-char call.

I agree with this and I don't see a problem. The port internally buffers
the #\( character when peek-char is called by read and a subsequent
read-char will consume it from the internal buffer. I think that one
character of lookahead is pretty much required for read to work.

Regards,

--
Göran Weinholt   | https://weinholt.se/
Debian Developer | 73 de SA6CJK