Email list hosting service & mailing list manager


Normalization vs. grapheme clusters John.Cowan 29 Jul 2005 20:20 UTC

The Unicode old farts^W^Wrespected elders have confirmed that if you segment
a Unicode string by grapheme clusters as officially defined by Unicode,
then normalization forms C and D will not change it; that is, any change
will be within grapheme clusters and not across their boundaries.

This does not hold for normalization forms KC and KD, which remove characters
with compatibility decompositions.

--
John Cowan  www.ccil.org/~cowan  www.reutershealth.com  xxxxxx@reutershealth.com
[T]here is a Darwinian explanation for the refusal to accept Darwin.
Given the very pessimistic conclusions about moral purpose to which his
theory drives us, and given the importance of a sense of moral purpose
in helping us cope with life, a refusal to believe Darwin's theory may
have important survival value. --Ian Johnston