Re: the "Unicode Background" section
Thomas Lord 23 Jul 2005 20:18 UTC
me:
>> In my view, DISPLAY (in R6RS, not forever) should be undefined in
>> that case (and in all cases where a string contains a
>> non-8-bit-character)
John:
> There are no such things as "8-bit characters" per se. There are a
> variety of 8-bit encodings that allow up to 256 characters, but they
> are not the same characters in all cases.
[I think your question is related to Shiro's which I intend to
answer separately but, for now: ]
"Character" is overloaded in this discussion (various Unicode concepts
and the Scheme type).
I'm suggesting a base-level I/O system that consume bits from
a port at some framing, is undefined if the integer value for
those bits is greater than 255, and otherwise returns the
scheme CHAR having that codepoint value.
This is a fairly radical proposal. It means, for example,
the READ-CHAR will never know squat about UTF-8: READ-CHAR
is doomed, under my suggestions, to remain forever a low-level
procedure.
On the other hand, it's upward compatible and sets a stage
for experimentation re I/O paradigms. (Upward compat with
the standard, not implementations -- the divergence being
over how procedures are named, not what they do.)
(I'm fuzzy about how it would make sense to reconcile
the byte-io routines with framing.)
-t