String comparison under Latin-1 and Unicode Ben Goetter (10 Mar 2000 18:27 UTC)
|
Re: String comparison under Latin-1 and Unicode
Sergei Egorov
(10 Mar 2000 19:40 UTC)
|
Re: String comparison under Latin-1 and Unicode
Dave Mason
(10 Mar 2000 20:00 UTC)
|
Re: String comparison under Latin-1 and Unicode
Dave Mason
(10 Mar 2000 20:06 UTC)
|
Re: String comparison under Latin-1 and Unicode
Sergei Egorov
(10 Mar 2000 20:32 UTC)
|
String comparison under Latin-1 and Unicode Ben Goetter 10 Mar 2000 18:26 UTC
>... collation and string > comparison in the wide Unicode world today. If I can't come up with something > reasonable that works in ASCII, Latin-1 *and* a Unicode setting The STRING>? problem under Unicode differs from the problem under Latin-1 only in degree. (Finns and Swedes use a different collation sequence from Danes and Norwegians. "AE" is a ligated character in English, but not in Danish. Spanish vs. French vs Traditional Spanish. And much, much more.) Hence even under Latin-1, STRING>? must take the domain language into account. Unicode merely makes more scripts - and so more languages - convenient. Proposal: The string comparators take an optional final argument that is not of type string, but a new type, language-specifier (abbrev. langid), which specifies the language of a block of text. The procedure CURRENT-LANGUAGE returns the langid for whatever language Scheme uses for string comparators lacking this optional final argument. Scheme initially uses some default langid that it inherits from its host environment; the procedure DEFAULT-LANGUAGE returns the langid for this default. The procedures CALL-WITH-LANGUAGE <i>langid proc</i> and WITH-LANGUAGE <i>langid thunk</i> change the value returned by CURRENT-LANGUAGE. Finally, the procedure LANGUAGE takes the ISO 639 language code, specified as a string, and returns the correct langid. LANGUAGE may be extended to take other values (perhaps a numeric language code from the host OS). This would allow correct collation of text using the current Scheme notion of "string." Building a higher-level "text" abstraction from this is purely mechanical. Ben