Re: Issues with Unicode

Show/hide message thread

Issues with Unicode Jonathan S. Shapiro (23 Apr 2006 08:55 UTC)

Re: Issues with Unicode Marc Feeley (23 Apr 2006 13:26 UTC)

Re: Issues with Unicode Shiro Kawai (26 Apr 2006 06:27 UTC)

Re: Issues with Unicode Taylor R. Campbell (26 Apr 2006 07:50 UTC)

Re: Issues with Unicode Shiro Kawai (26 Apr 2006 22:21 UTC)

Re: Issues with Unicode Jorgen Schaefer (26 Apr 2006 23:40 UTC)

Re: Issues with Unicode Marc Feeley 23 Apr 2006 13:25 UTC

On 23-Apr-06, at 4:54 AM, Jonathan S. Shapiro wrote:

> ...
>
> 3. There is an issue with newline processing in input and output
> (which
> probably is the subject of a different SRFI). Platforms do not agree
> about newline conventions in text files. A regrettable consequence is
> that character streams require specification at open time as to
> whether
> they are being opened for binary or text processing.
>
> One regrettable consequence of this is that the R5RS specification for
> open-output-file and open-input-file is inadequate. A second argument
> needs to be added to specify newline processing conventions. Note that
> this also became an issue for UNIX STDIO, and that acceptance of
> "t" and
> "b" in the file mode argument to fopen() is now mandated by the C
> standard.
>
> This is also an issue for string ports.
>
> In general, any operation that opens a port must specify the desired
> processing for newlines.
>
> ...
>
> 9. Once you have a variable-length character representation, it
> becomes
> necessary to incorporate separate means for reading bytes from input
> streams. For example this is needed if the programmer wishes to
> construct code to process files in (e.g.) UTF-32. This raises a
> question
> about newline canonicalization. My suggestion is that the port's
> handling of newlines should be independent of the caller. That is,
> read-byte on a text-mode port that would normally convert the input
> \r\n
> to \n should return the byte corresponding to \n. If you want
> unmangled
> bytes, use binary mode input.
>
> The same argument does *not* apply for read-char, because it is the
> nature of read-char to process the bytes in order to determine
> character
> length.

For a solution to these problems see SRFI 91.  I would appreciate
feedback on the SRFI 91 mailing list if you think it does not satisfy
your needs.

Marc