Email list hosting service & mailing list manager

character strings versus byte strings Matthew Flatt (22 Dec 2003 14:16 UTC)
Re: character strings versus byte strings Per Bothner (22 Dec 2003 17:09 UTC)
Re: character strings versus byte strings Matthew Flatt (22 Dec 2003 17:23 UTC)
Re: character strings versus byte strings tb@xxxxxx (22 Dec 2003 20:23 UTC)
(missing)
(missing)
Re: character strings versus byte strings Tom Lord (22 Dec 2003 22:36 UTC)
Re: character strings versus byte strings tb@xxxxxx (22 Dec 2003 22:41 UTC)
Re: character strings versus byte strings Shiro Kawai (22 Dec 2003 23:00 UTC)
Re: character strings versus byte strings Michael Sperber (23 Dec 2003 09:36 UTC)

Re: character strings versus byte strings tb@xxxxxx 22 Dec 2003 22:41 UTC

Tom Lord <xxxxxx@emf.net> writes:

>     > Many many many computer systems could get away with
>     > ignoring the locale-dependency of case-mapping, but now they can
>     > no longer plead ignorance.  (Though the problems are hardly
>     > obscure; even German causes problems.)
>
> (I think that, being a culturally unbiased person, you mean that
> German causes one _unique_ problem regarding case mapping.)

The problem in German that I'm thinking of is the eszet problem, where
there is a lower case letter whose uppercase is a two-letter combo.
(And downcasing SS requires morpohological understanding of the word
as well, because not all SS pairs should be downcased as an eszet,
IIUC.)

That's a way in which German causes problems for easy case mapping.

The situation with the two Turkish I's is different, and more
symmetrical, and it would be wrong to characterize that as "Turkish
causing a problem".  But I think my characterization of the situation
with German stands.  That is, dealing with Turkish is no harder than
dealing with English--it's just hard to deal with both at once.

Dealing with German properly is hard all by itself.

Thomas