Re: revised w/nocase text, considering titlecase and cased

Show/hide message thread

revised w/nocase text, considering titlecase and cased Alex Shinn (08 May 2014 12:03 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (09 May 2014 22:23 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (10 May 2014 00:15 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (10 May 2014 00:49 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (10 May 2014 01:09 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (10 May 2014 02:21 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (10 May 2014 02:42 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (10 May 2014 14:24 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (10 May 2014 22:56 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (12 May 2014 03:46 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (12 May 2014 05:54 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (13 May 2014 14:16 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan (13 May 2014 14:25 UTC)

Re: revised w/nocase text, considering titlecase and cased Alex Shinn (13 May 2014 14:31 UTC)

Re: revised w/nocase text, considering titlecase and cased John Cowan 12 May 2014 05:54 UTC

Alex Shinn scripsit:

>   As a special case, the pre-defined named character sets
>   upper and lower (and their aliases upper-case and lower-case)
>   are defined to match all characters with the cased property (L&).
>   Note also all other pre-defined named character sets are
>   equivalent to themselves under w/nocase.
>
>   Rationale: The differences between the case insensitive
>   lower and upper and the cased property are few and unlikely
>   to match user intention.  Moreover, unlike the algorithmically
>   mapped upper and lower char-sets, the cased property is
>   readily available in most Unicode implementations.

Looks good to me.

I think this language should also be added:

    Note that placing a sequence consisting of a base character
    and combining characters into a character string representing
    a character set will not do what the user probably expects;
    it will create a character set pattern containing the base
    character and the combining character(s) as alternatives.
    For the same reason, it is inadvisable to apply Unicode
    normalization to such strings.

> And the only realistic alternative I can see is making this
> special case optional, so that either behavior is correct.

Too much flexibility, I think.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
A witness cannot give evidence of his age unless he can remember being born.
                --Judge Blagden