Re: Surrogates and character representation Tom Emerson 24 Jul 2005 13:25 UTC
John.Cowan writes: > > Surrogates are a side-effect of UTF-16. Period. Application-level code > > just doesn't see them. This entire discussion about whether or not a > > CHAR should include surrogate code points is, IMHO, a waste of > > everyones talents here. It's much ado about nothing. > > I agree that applications developers rarely have to think about surrogates, > but language/library designers (whose job it is to make corner cases > unsuprising) do have to think about them. I disagree that Surrogates are a corner case. Do nothing with them and encountering an unpaired surrogate in a string is no different than encountering #xFFFE. Heck, even encountering paired surrogates in a string is semantically meaningless but valid. > FWIW, I now think (after some talk on a private Unicode list) that it's > correct to allow surrogates as Scheme characters; that is, the range of > char->integer should be 0 to #x10FFFF. The arguments that Ken and Mark made there to change your mind may be worth summarizing here. -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"