Re: character strings versus byte strings
bear 23 Dec 2003 01:29 UTC
On Mon, 22 Dec 2003, Tom Lord wrote:
> > Many many many computer systems could get away with
> > ignoring the locale-dependency of case-mapping, but now they can
> > no longer plead ignorance. (Though the problems are hardly
> > obscure; even German causes problems.)
>
>(I think that, being a culturally unbiased person, you mean that
>German causes one _unique_ problem regarding case mapping.)
This is absolutely the case. From the perspective of grapheme-
characters, and ignoring ligatures as a pure typesetting issue,
Eszett is the ONLY character in all of unicode that upcases into
a different number of characters. I'm using an ugly kluge to
put off changing the length of any string until a canonicalization
operation, or return the upcase as a single non-standard character
(yet another character which doesn't exist in unicode), but I'm
sorely tempted to simply declare all use of eszett, given its
unique status in the history of human writing, to be an error.
Bear