bear scripsit:
> The proposed semantics for collation of strings
> (using string>? & friends) by pointwise comparison
> is in direct conflict with the unicode standard
> for locale-independent collation of strings, as
> expressed in
>
> http://www.unicode.org/reports/tr10/
Note that the Unicode Collation Algorithm is not, strictly speaking,
part of the Unicode standard; it even has its own ISO number (14651
rather than 10646). Compliance to the Unicode Standard neither
requires nor forbids conformance to the UCA.
> The unicode collation algorithm abstracts over
> representation issues such as how characters are
> rendered as sequences of individual codepoints,
> making the test for canonical (glyph) equivalence
> rather than codepoint equivalence.
(You're misusing the term "glyph"; see the Unicode Glossary.
I assume you mean something close to "grapheme".)
> Since I figure most language implementors will ignore
> it (and *are* ignoring it, in Java and C#) this part
> of the Unicode standard will probably eventually be
> abandoned.
That turns out not to be the case. :-)
For Java, you can use either fast (binary) or smart (UCA) comparison
routines: the former are provided in the java.lang.String class, the
latter by java.text.Collator and related classes. (The latter include the
UCA's provisions for tailoring collation order for specific locales: for
example, to make ä sort after z, as Swedes expect, rather than with a,
its normal place.) UCA collation is also readily available for C and C++
programs via IBM's open-source ICU library.
> At the same time, I want to leave it legal for
> scheme implementors who are actually doing unicode
> support to conform to it if they want to.
That can be done by leaving the *-ci? procedures alone and allowing
implementers to provide their own UCA-compliant procedures.
--
John Cowan http://www.ccil.org/~cowan xxxxxx@reutershealth.com
Be yourself. Especially do not feign a working knowledge of RDF where
no such knowledge exists. Neither be cynical about RELAX NG; for in
the face of all aridity and disenchantment in the world of markup,
James Clark is as perennial as the grass. --DeXiderata, Sean McGrath