Re: Why are byte ports "ports" as such? John Cowan 23 May 2006 18:15 UTC

Jonathan S. Shapiro scripsit:

> > Unless char is defined to be a code point. Which is IMHO the most
> > reasonable choice: code points are the natural atomic units of Unicode
> > text, and most Unicode algorithms are expressed in terms of code points.
>
> In many respects I agree that this would be sensible from the
> programmer's perspective.
>
> Unfortunately it is quite wrong, which is something that the UNICODE
> people go to great lengths to make clear (and, IMO, a serious failing of
> UNICODE).

It's not *wrong*.  It's not a matter of the Right Thing and the Wrong Thing.
For some purposes, code units (8 or 16 or 32 bits) are the Right Thing;
for some purposes, codepoints are; for some purposes, higher-level units
are.  It's about appropriate choices.

And if Unicode is complicated (and it is), it's because it's embedded in
a complicated world.

--
John Cowan   xxxxxx@ccil.org  http://www.ccil.org/~cowan
Most languages are dramatically underdescribed, and at least one is
dramatically overdescribed.  Still other languages are simultaneously
overdescribed and underdescribed.  Welsh pertains to the third category.
        --Alan King