Re: the "Unicode Background" section
John.Cowan 22 Jul 2005 21:56 UTC
Thomas Lord scripsit:
> I think it might be realistic to label ports not with
> the encoding scheme they want, but with the set of
> code-values they can transmit -- in other words
> with their framing constraints. In other words --
> a "UTF-8 port" (no such thing, really) and an "ASCII port"
> (no such thing, again) are *really* just "8-bit ports".
> A "UTF-16 port" is *really* just a "16-bit port".
The difficulty here is that an ISO-8859-1 port {produces,accepts} a
different set of characters from an ISO-8859-2 port. Unless a port is
labeled with an encoding, you can't know what characters it will and
won't {accept,produce}, and you are stuck with some system default.
Even a 16-bit port behaves differently depending on whether it is
a UTF-16 port, a UTF-16LE port, or a UTF-16BE port.
I'm not saying that any Scheme system has to accept every possible
encoding (though I do think at least ASCII, UTF-8, and UTF-16 should
be mandatory; they are all trivial), but it needs to be possible
to specify the encoding of a port when it is created. (I don't think
it's necessary to be able to change it on the fly, though.)
> At the same time, several of us agree that WRITE-CHAR
> should accept a CHAR argument which is, in essence, a
> codepoint.
In which case it is the output port's encoding that says what octets
to write.
> I think an implementation should be permitted to have a
> version of WRITE-CHAR which is not total for all PORT,
> CHAR pairs: try to write a wide character on an 8-bit
> port and that's an error, etc.
Absolutely. Or more specifically: attempt to write a character that's
not in the repertoire associated with the encoding is an error.
Allowing this to be lax is just asking for trouble.
Given that, it's easy to create a higher-level abstraction that will
{write,read} impossible characters with some encoding scheme.
--
Some people open all the Windows; John Cowan
wise wives welcome the spring xxxxxx@reutershealth.com
by moving the Unix. http://www.reutershealth.com
--ad for Unix Book Units (U.K.) http://www.ccil.org/~cowan
(see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)