Re: General comments or SRFIs 79-82 Michael Sperber 28 Nov 2005 18:40 UTC

Many thanks for your long comments!

Marcin 'Qrczak' Kowalczyk <> writes:

> I don't like the separation into readers/writers, streams, and ports.
> Too many similar concepts are treated as completely disjoint types.

Then I haven't explained things very well ...

> As I understand it:
> - readers/writers deal with physical I/O of blocks of bytes
> - streams provide encoding and newline conversion, buffering,
>   and scanning the same input multiple times
> - ports provide raw I/O of bytes, can convert UTF-8 to characters,
>   and can convert between characters and lines, or characters and
>   Scheme external representations
> This feels like a single package. There should be some overall
> description of the whole design somewhere, so one doesn't have to
> dig into four separate SRFIs.

You do know it used to be a single package (SRFI 68), and the folks
over on its discussion list were pretty unanimously for splitting it
up?   I'm afraid you're outvoted ...

But let me provide a different characterization:

- Readers/writers are meant for people who *provide* new data
  sources---it's easy to provide readers and writers, but they're not
  meant to be used by user programs.

- Ports and streams are convenient to use, but would be hard to
  provide directly.  (Most Scheme systems that provide facilities for
  creating new "port types" either sacrifice performance or simplicity
  or offer something very similar to readers/writers.)

  Pick the one that suits your application.  Ports are more what
  Schemers are used to.  (Input) streams offer the advantage that it's
  much easier to write transcoder-like facilities.

In this light, they appear very different from one another.

> I don't quite understand the rationale for using UTF-8 as the
> intermediate format.

Could you be more specific as to what you don't understand?  There's
an explanation in the end.

> For mixing textual and binary I/O (if the encoding is not known to
> be UTF-8) one has to put and remove a converter dynamically on every
> switch, and it's incompatible with block-conversion of input (it
> must be converted one character at a time, unless we can find the
> boundary between text and binary data when looking at the raw stream
> before the conversion).

... or do the work in the one transcoder.  Allowing mixing of textual
and binary I/O is always a matter of trade-offs between efficiency and
functionality.  I can see how somebody else might come down in a
different place, but I haven't seen a solution that addresses your
problem.  (There was some discussion over on the SRFI 68 list.)  Maybe
you can provide details on how you'd solve this problem---I'd love to
improve upon this, but don't know how.

> EOL style doesn't include the possibility of accepting any of the
> three common conventions, which is used by Java and probably .NET.

The problem is that this is only the case for input, not output, so
you'd get three more EOL styles instead of one.  This is probably
better handled by a tailored READ-LINE / INPUT-LINE procedure.

> Since on classic Macintosh Perl (and perhaps C too, haven't checked)
> exchanges the meanings of \n and \r (by actually changing their
> interpretation in the source instead of recoding), I guess it would be
> more useful to exchange them when recoding in the CR style, instead
> of by treating either as a newline on input and writing a newline for
> either on output.

Interesting, I didn't know this.  Could you provide details or a web

> StdIn and StdOut can be seekable, and it's sometimes useful (e.g. Unix
> "wc" makes use of this). The reference implementation doesn't allow that.

Sure---the underlying substrate doesn't allow it.  But you might in
your implementation, and that shouldn't be hard.

> I don't understand input-string. How much does it read?

However much the implementation feels like.  To be honest, I'm not
positive that it's that useful (unlike INPUT-BLOB)---it's mainly there
for symmetry with INPUT-BLOB.

> When reading from ports, it's not specified what happens when data are
> not valid UTF-8. Similarly for decoding from e.g. UTF-16 (unpaired
> surrogates), UTF-32 (too large values), or encoding to latin-1
> (characters above U+00FF).

> Here are various concrete stream types: [...]

Above, you criticize the fact that I have different types for
different levels of the system.  But how is your hierarchy of "stream
types" different from that?  (I'm not trying to criticize you---I'm

Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla