character strings versus byte strings
Matthew Flatt
(22 Dec 2003 14:16 UTC)
|
||
Re: character strings versus byte strings
Per Bothner
(22 Dec 2003 17:09 UTC)
|
||
Re: character strings versus byte strings
Matthew Flatt
(22 Dec 2003 17:23 UTC)
|
||
Re: character strings versus byte strings
tb@xxxxxx
(22 Dec 2003 20:23 UTC)
|
||
(missing)
|
||
(missing)
|
||
Re: character strings versus byte strings
Tom Lord
(22 Dec 2003 22:36 UTC)
|
||
Re: character strings versus byte strings tb@xxxxxx (22 Dec 2003 22:41 UTC)
|
||
Re: character strings versus byte strings
Shiro Kawai
(22 Dec 2003 23:00 UTC)
|
||
Re: character strings versus byte strings
Michael Sperber
(23 Dec 2003 09:36 UTC)
|
Re: character strings versus byte strings tb@xxxxxx 22 Dec 2003 22:41 UTC
Tom Lord <xxxxxx@emf.net> writes: > > Many many many computer systems could get away with > > ignoring the locale-dependency of case-mapping, but now they can > > no longer plead ignorance. (Though the problems are hardly > > obscure; even German causes problems.) > > (I think that, being a culturally unbiased person, you mean that > German causes one _unique_ problem regarding case mapping.) The problem in German that I'm thinking of is the eszet problem, where there is a lower case letter whose uppercase is a two-letter combo. (And downcasing SS requires morpohological understanding of the word as well, because not all SS pairs should be downcased as an eszet, IIUC.) That's a way in which German causes problems for easy case mapping. The situation with the two Turkish I's is different, and more symmetrical, and it would be wrong to characterize that as "Turkish causing a problem". But I think my characterization of the situation with German stands. That is, dealing with Turkish is no harder than dealing with English--it's just hard to deal with both at once. Dealing with German properly is hard all by itself. Thomas