Re: the "Unicode Background" section
Thomas Lord
(22 Jul 2005 19:17 UTC)
|
Re: the "Unicode Background" section John.Cowan (22 Jul 2005 21:56 UTC)
|
Re: the "Unicode Background" section
Shiro Kawai
(22 Jul 2005 23:54 UTC)
|
Re: the "Unicode Background" section
Shiro Kawai
(22 Jul 2005 23:32 UTC)
|
Re: the "Unicode Background" section John.Cowan 22 Jul 2005 21:56 UTC
Thomas Lord scripsit: > I think it might be realistic to label ports not with > the encoding scheme they want, but with the set of > code-values they can transmit -- in other words > with their framing constraints. In other words -- > a "UTF-8 port" (no such thing, really) and an "ASCII port" > (no such thing, again) are *really* just "8-bit ports". > A "UTF-16 port" is *really* just a "16-bit port". The difficulty here is that an ISO-8859-1 port {produces,accepts} a different set of characters from an ISO-8859-2 port. Unless a port is labeled with an encoding, you can't know what characters it will and won't {accept,produce}, and you are stuck with some system default. Even a 16-bit port behaves differently depending on whether it is a UTF-16 port, a UTF-16LE port, or a UTF-16BE port. I'm not saying that any Scheme system has to accept every possible encoding (though I do think at least ASCII, UTF-8, and UTF-16 should be mandatory; they are all trivial), but it needs to be possible to specify the encoding of a port when it is created. (I don't think it's necessary to be able to change it on the fly, though.) > At the same time, several of us agree that WRITE-CHAR > should accept a CHAR argument which is, in essence, a > codepoint. In which case it is the output port's encoding that says what octets to write. > I think an implementation should be permitted to have a > version of WRITE-CHAR which is not total for all PORT, > CHAR pairs: try to write a wide character on an 8-bit > port and that's an error, etc. Absolutely. Or more specifically: attempt to write a character that's not in the repertoire associated with the encoding is an error. Allowing this to be lax is just asking for trouble. Given that, it's easy to create a higher-level abstraction that will {write,read} impossible characters with some encoding scheme. -- Some people open all the Windows; John Cowan wise wives welcome the spring xxxxxx@reutershealth.com by moving the Unix. http://www.reutershealth.com --ad for Unix Book Units (U.K.) http://www.ccil.org/~cowan (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)