Re: the "Unicode Background" section
John.Cowan 23 Jul 2005 04:55 UTC
Thomas Lord scripsit:
> There is no reason I can see why the implementation must provide
> the kind of labeling you like. I have no trouble imagining
> many applications that don't need it --- they either won't
> need the system to enforce a restriction because they'll know
> what their ports are talking to or they are robust in the case
> that they get wrong what the ports expect. The goal
> here isn't to try to prevent programmers from making mistakes.
Granted. But if a {input,output} port doesn't know its encoding, it
doesn't know how to translate {characters,bytes} to {bytes,characters}
at all. It's not a matter of overcoming restrictions -- it's fundamental.
An output port needs to know, e.g., whether to output #\u0131 (dotless i)
as an 0xB9 byte (ISO 8859-3) or an 0xFD (ISO 8859-9) or an 0xB8 (ANSEL)
or not at all. (The same mutatis mutandis for an input port.)
> For example, a common suggestion is that you specify when
> first creating a port what encoding it is. Well, what if
> I hope to send traffic over that port that will use different
> encodings at different points during the run? Clearly labeling
> is not trivial -- I'd say, not well understood.
That can happen on occasion, but it's a highly specialized case that can
be layered directly over an octet port. Specifying a fixed encoding
for a character port makes straightforward things easy. This is far
from being rocket science.
--
John Cowan xxxxxx@reutershealth.com www.ccil.org/~cowan www.reutershealth.com
If he has seen farther than others,
it is because he is standing on a stack of dwarves.
--Mike Champion, describing Tim Berners-Lee (adapted)