Re: Surrogates and character representation
John.Cowan 24 Jul 2005 05:37 UTC
Tom Emerson scripsit:
> Surrogates are a side-effect of UTF-16. Period. Application-level code
> just doesn't see them. This entire discussion about whether or not a
> CHAR should include surrogate code points is, IMHO, a waste of
> everyones talents here. It's much ado about nothing.
I agree that applications developers rarely have to think about surrogates,
but language/library designers (whose job it is to make corner cases
unsuprising) do have to think about them.
FWIW, I now think (after some talk on a private Unicode list) that it's
correct to allow surrogates as Scheme characters; that is, the range of
char->integer should be 0 to #x10FFFF.
--
John Cowan xxxxxx@reutershealth.com www.reutershealth.com www.ccil.org/~cowan
It's the old, old story. Droid meets droid. Droid becomes chameleon.
Droid loses chameleon, chameleon becomes blob, droid gets blob back
again. It's a classic tale. --Kryten, Red Dwarf