|
the discussion so far
Matthew Flatt
(16 Jul 2005 12:41 UTC)
|
||
|
(missing)
|
||
|
(missing)
|
||
|
(missing)
|
||
|
Re: the discussion so far
bear
(20 Jul 2005 02:45 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(20 Jul 2005 03:56 UTC)
|
||
|
(missing)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 02:50 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 02:56 UTC)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:15 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:24 UTC)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:38 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:49 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:24 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 04:27 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:58 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 05:04 UTC)
|
||
|
Re: the discussion so far
Jorgen Schaefer
(16 Jul 2005 13:05 UTC)
|
||
|
Re: the discussion so far
Matthew Flatt
(16 Jul 2005 13:21 UTC)
|
||
|
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:58 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:42 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:57 UTC)
|
||
|
Re: the discussion so far
Jorgen Schaefer
(17 Jul 2005 03:33 UTC)
|
||
|
Re: the discussion so far
bear
(16 Jul 2005 18:07 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(17 Jul 2005 04:49 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:40 UTC)
|
||
Matthew Flatt <xxxxxx@cs.utah.edu> writes:
> So, the `char-ci' operations should use the "simple case folding" table
> from CaseFolding.txt, and the `string-ci' operations should use the
> "full case folding" table from CaseFolding.txt. After folding, the
> comparison result is determined character-by-character.
Codepoint-by-codepoint, yes. (That is what you meant, I just
wanted to clarify. The terminology is a bit confusing, as
"character" is defined differently in Unicode than it is in this
SRFI)
> Meanwhile, `string-upcase' and `string-downcase' reflect the same
> improved handling at the string level (compared to the character level)
> by using SpecialCasing.txt in addition to UnicodeData.txt.
>
> Have I got that right?
Yes :-)
There's one last problem with this approach: It leaves out
normalization.
In Unicode, there are multiple sequences of code points that
represent the same character. For example, the code point
sequences (#\x00C4) and (#\x0041 #\x0308) are equivalent.
00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
0041 LATIN CAPITAL LETTER A
0308 COMBINING DIAERESIS
Normalization maps those sequences to a common form (either to the
composed or the decomposed form) so that comparison can be done on
a codepoint-by-codepoint basis.
Luckily, case folding is specified in such a way that a normalized
sequence of code points remains normalized if case-folded.
So, to make STRING-CI=? or, indeed, STRING=? work, one option
would be for the SRFI to provide STRING-NORMALIZE-* procedures,
and require normalized strings to be passed to the comparison
procedures for them to work correctly.
Greetings,
-- Jorgen
--
((email . "xxxxxx@forcix.cx") (www . "http://www.forcix.cx/")
(gpg . "1024D/028AF63C") (irc . "nick forcer on IRCnet"))