Case-mapping, Unicode & internationalisation
shivers@xxxxxx
(24 Jan 2000 13:37 UTC)
|
Re: Case-mapping, Unicode & internationalisation
Sergei Egorov
(24 Jan 2000 17:09 UTC)
|
text processes vs. string procedures shivers@xxxxxx (24 Jan 2000 21:52 UTC)
|
Re: text processes vs. string procedures
Sergei Egorov
(24 Jan 2000 22:39 UTC)
|
Re: text processes vs. string procedures
shivers@xxxxxx
(25 Jan 2000 01:19 UTC)
|
text processes vs. string procedures shivers@xxxxxx 24 Jan 2000 21:52 UTC
I agree with almost all of Sergei's msg. - Basic string procs should *not* require textual well-formedness in a Unicode world. A string full of accents and umlauts and cedillas with no preceding base or start character is still a legal string. - Full Unicode support will certainly require other procedures not in the SRFI-13 spec. Sergei's examples of canonical & compatibility decomposition and composition are good ones. These should go in a Unicode-specific library, which is not the goal of SRFI-13. - We also certainly need to do a new char library. Or perhaps a pair of them: one generic one, and one for Unicode-specific things. - However, I think case-mapping and string-comparison are basic things, and they can be given a generic, portable definition independent of the underlying character encoding. Case-mapping does *not* require strings to be well-formed text. ASCII, Latin-1 and Unicode all provide a clear, language-independent definitions of this operation. I don't want the string library to be minimal. I want it to be useful. People -- many of whom currently program with Latin-1 or ASCII Schemes -- case-map and compare strings frequently. These operations can be provided with an API which is portable across ASCII, Latin-1 and Unicode. So there's no barrier here.