Re: Encodings. Ken Dickey 13 Feb 2004 12:56 UTC

On Friday 13 February 2004 07:03 pm, Bradd W. Szonye wrote:
> On Fri, Feb 13, 2004 at 07:51:49AM +0100, Ken Dickey wrote:
> > Let's say that there is a Scheme SRFI (or even, *GASP*, a standard)
> > which picks a single cannonical Unicode form (say the most compact
> > one) and requires, where Unicode is used, that Scheme programs be
> > prepared in that format ....
>
> Such a program would not conform to the Unicode standard:

Who cares?  Scheme does not conform to ASCII or EBCDIC.  Why should Scheme
conform to the Unicode Standard(s)?  Defining what is an acceptable Scheme
program should be sufficient.

It is desirable that a Scheme with support for extended identifiers should not
be large or expensive to implement.  I have suggested a solution in which
this is the case, i.e. to allow implementations to specify and restrict what
source text is allowable for Scheme programs.  Scheme source could be in
ASCII, ISO-Latin-1, (pre-canonicalized) Unicode (perhaps ucs-2).

>     C9. A process shall not assume that the interpretations of two
>         canonical-equivalent character sequences are distinct.
>
> This section goes on to concede that
>
>     Ideally, an implementation would always interpret two
>     canonical-equivalent character sequences identically. There are
>     practical circumstances under which implementations may reasonably
>     distinguish them.

Scheme does not IMPLEMENT Unicode.  Support for processing Unicode data is a
good idea.   Does every Scheme implementation come with its own conforming
Unicode source editor?  Isn't this asking a bit much!?!  8^)

> In other words, recognizing canonically-equivalent characters *is* the
> responsibility of the reader, if it claims to implement the Unicode
> character set.

I still fail to see why one would wish to make such a claim.  I have not yet
seen a convincing case made for making Scheme "a conforming Unicode
implementation".  Convince me!

$0.02,
-KenD