peek-char problem
Shiro Kawai
(18 Jun 2020 03:41 UTC)
|
Re: peek-char problem
Marc Nieper-Wißkirchen
(18 Jun 2020 05:52 UTC)
|
Re: peek-char problem Göran Weinholt (18 Jun 2020 09:03 UTC)
|
Re: peek-char problem
Shiro Kawai
(18 Jun 2020 10:13 UTC)
|
Re: peek-char problem
Göran Weinholt
(18 Jun 2020 12:25 UTC)
|
Re: peek-char problem
Shiro Kawai
(18 Jun 2020 19:09 UTC)
|
Re: peek-char problem
Per Bothner
(18 Jun 2020 16:29 UTC)
|
Re: peek-char problem
Shiro Kawai
(18 Jun 2020 18:53 UTC)
|
Shiro Kawai <xxxxxx@gmail.com> writes: > I thought peek-char may not be so important in practice, since parsers > could carry around a prefetched character. When I was updating the > reference implementation, however, I noticed it might not be so > simple. > > To implement read-line or read, you do need to lookahead one character > (In case of read-line, you need to peek after CR is read, for the next > character may or may not be LF. In case of read, if you're reading an > identifier, you need to leave the subsequent delimiting character > other than whitespaces.) > > If a custom port can't be passed to read-line or read, its use is > severely limited. > > Am I missing some obvious workaround? (I haven't gone through this SRFI yet, so what I'm saying is just based on my experience with implementing R6RS.) The read and read-line procedures work on a level where they do not see untranslated newlines. They read from textual input(/output) ports and any translation from CR LF to #\newline has already happened. Here's a breakdown of how newlines are handled for each port type: * Custom binary input port - binary data has no #\newline. * Custom textual input ports - the source directly produces #\newline with no transcoding necessary. * Custom binary output port - binary data has no #\newline. * Custom textual output port - #\newline is sent directly to the sink. * Custom binary input/output port - binary data has no #\newline. * Transcoded binary input port - the transcoder parses newlines according to the eol style and converts them to #\newline (none means no translation, any other style means all styles are recognized and translated to #\newline). * Transcoded binary output port - the transcoder translates #\newline according to the eol style. * Transcoded binary input/output port - combination of the above. The peek-char procedure either uses the data already in the port's buffer or it calls the source to get data into the port's buffer. If the port is unbuffered then I believe it is still necessary to fill in the port's buffer with at least one character. An underlying unbuffered binary port would have one byte at a time consumed until the transcoded port has a full character. It might seem like transcoders need to look ahead to recognize that CR LF should be a single #\newline and that this would break unbuffered ports. But actually the transcoder can get away with just knowing the previous character. Suppose the input is #vu8(13 10). The first time peek-char invokes the transcoder it will see CR (13) and return #\newline. The second time it sees NL (10) but the transcoder remembers that the previous char was CR, so it does not return any output. This works for all the supported eol styles. The read-line procedure does not need peek-char because it only needs to recognize and stop at #\newline. The read procedure does indeed want peek-char for stopping at delimiters and quite possibly for lexing in general. Hope that helps. Is there anything else that's unclear about the R6RS I/O ports system? (FWIW, anyone implementing an R6RS-alike I/O port system might want to look at the libc stdio system. It is very similar and also uses sources and sinks. One difference, which I see this SRFI touches on, is that stdio sinks flush if they are given a zero argument. I guess using a special flush procedure is better. Without an explicit flush, a custom output port can't really do its own buffering, so this is an improvement over R6RS.) Regards, -- Göran Weinholt | https://weinholt.se/ Debian Developer | 73 de SA6CJK