Re: Surrogates and character representation
Tom Emerson 24 Jul 2005 13:25 UTC
John.Cowan writes:
> > Surrogates are a side-effect of UTF-16. Period. Application-level code
> > just doesn't see them. This entire discussion about whether or not a
> > CHAR should include surrogate code points is, IMHO, a waste of
> > everyones talents here. It's much ado about nothing.
>
> I agree that applications developers rarely have to think about surrogates,
> but language/library designers (whose job it is to make corner cases
> unsuprising) do have to think about them.
I disagree that Surrogates are a corner case. Do nothing with them and
encountering an unpaired surrogate in a string is no different than
encountering #xFFFE. Heck, even encountering paired surrogates in a
string is semantically meaningless but valid.
> FWIW, I now think (after some talk on a private Unicode list) that it's
> correct to allow surrogates as Scheme characters; that is, the range of
> char->integer should be 0 to #x10FFFF.
The arguments that Ken and Mark made there to change your mind may be
worth summarizing here.
--
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"