Re: revised w/nocase text, considering titlecase and cased
John Cowan 12 May 2014 05:54 UTC
Alex Shinn scripsit:
> As a special case, the pre-defined named character sets
> upper and lower (and their aliases upper-case and lower-case)
> are defined to match all characters with the cased property (L&).
> Note also all other pre-defined named character sets are
> equivalent to themselves under w/nocase.
>
> Rationale: The differences between the case insensitive
> lower and upper and the cased property are few and unlikely
> to match user intention. Moreover, unlike the algorithmically
> mapped upper and lower char-sets, the cased property is
> readily available in most Unicode implementations.
Looks good to me.
I think this language should also be added:
Note that placing a sequence consisting of a base character
and combining characters into a character string representing
a character set will not do what the user probably expects;
it will create a character set pattern containing the base
character and the combining character(s) as alternatives.
For the same reason, it is inadvisable to apply Unicode
normalization to such strings.
> And the only realistic alternative I can see is making this
> special case optional, so that either behavior is correct.
Too much flexibility, I think.
--
John Cowan http://www.ccil.org/~cowan xxxxxx@ccil.org
A witness cannot give evidence of his age unless he can remember being born.
--Judge Blagden