Re: Surrogates and character representation John.Cowan 22 Jul 2005 04:09 UTC
Tom Emerson scripsit: > If you treat the surrogates as undefined within the character range, > then you must (for consistency) treat all of the other undefined > abstract characters as holes. This just complicates processing. All other undefined codepoints are potentially definable: they correspond to Unicode scalar values. Surrogate codepoints are not definable and don't correspond to any Unicode scalar value. The difference is architectural. > One question I've had: how are 8-bit (i.e., byte) strings handled > here? Is there no distinction between operations on raw bytes and > operations on characters? Those things are not strings: they are vectors of unsigned 8-bit integers. -- John Cowan xxxxxx@reutershealth.com http://www.ccil.org/~cowan Is it not written, "That which is written, is written"?