the discussion so far
Matthew Flatt
(16 Jul 2005 12:41 UTC)
|
||
(missing)
|
||
(missing)
|
||
Re: the discussion so far
Alex Shinn
(20 Jul 2005 02:50 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 02:56 UTC)
|
||
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:15 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:24 UTC)
|
||
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:38 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:49 UTC)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:24 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 04:27 UTC)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:58 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 05:04 UTC)
|
||
(missing)
|
||
(missing)
|
||
Re: the discussion so far
bear
(20 Jul 2005 02:45 UTC)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 03:56 UTC)
|
||
Re: the discussion so far
Jorgen Schaefer
(16 Jul 2005 13:05 UTC)
|
||
Re: the discussion so far
Matthew Flatt
(16 Jul 2005 13:21 UTC)
|
||
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:58 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:42 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:57 UTC)
|
||
Re: the discussion so far
Jorgen Schaefer
(17 Jul 2005 03:33 UTC)
|
||
Re: the discussion so far
bear
(16 Jul 2005 18:07 UTC)
|
||
Re: the discussion so far
John.Cowan
(17 Jul 2005 04:49 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:40 UTC)
|
Matthew Flatt <xxxxxx@cs.utah.edu> writes: > So, the `char-ci' operations should use the "simple case folding" table > from CaseFolding.txt, and the `string-ci' operations should use the > "full case folding" table from CaseFolding.txt. After folding, the > comparison result is determined character-by-character. Codepoint-by-codepoint, yes. (That is what you meant, I just wanted to clarify. The terminology is a bit confusing, as "character" is defined differently in Unicode than it is in this SRFI) > Meanwhile, `string-upcase' and `string-downcase' reflect the same > improved handling at the string level (compared to the character level) > by using SpecialCasing.txt in addition to UnicodeData.txt. > > Have I got that right? Yes :-) There's one last problem with this approach: It leaves out normalization. In Unicode, there are multiple sequences of code points that represent the same character. For example, the code point sequences (#\x00C4) and (#\x0041 #\x0308) are equivalent. 00C4 LATIN CAPITAL LETTER A WITH DIAERESIS 0041 LATIN CAPITAL LETTER A 0308 COMBINING DIAERESIS Normalization maps those sequences to a common form (either to the composed or the decomposed form) so that comparison can be done on a codepoint-by-codepoint basis. Luckily, case folding is specified in such a way that a normalized sequence of code points remains normalized if case-folded. So, to make STRING-CI=? or, indeed, STRING=? work, one option would be for the SRFI to provide STRING-NORMALIZE-* procedures, and require normalized strings to be passed to the comparison procedures for them to work correctly. Greetings, -- Jorgen -- ((email . "xxxxxx@forcix.cx") (www . "http://www.forcix.cx/") (gpg . "1024D/028AF63C") (irc . "nick forcer on IRCnet"))