Email list hosting service & mailing list manager

Re: the "Unicode Background" section Thomas Lord (22 Jul 2005 03:28 UTC)
Surrogates and character representation Tom Emerson (22 Jul 2005 03:55 UTC)
Re: Surrogates and character representation John.Cowan (22 Jul 2005 04:09 UTC)
Re: Surrogates and character representation Tom Emerson (22 Jul 2005 04:26 UTC)
Re: Surrogates and character representation Thomas Bushnell BSG (23 Jul 2005 07:19 UTC)
Re: Surrogates and character representation Tom Emerson (23 Jul 2005 17:38 UTC)
Re: Surrogates and character representation John.Cowan (24 Jul 2005 05:37 UTC)
Re: Surrogates and character representation Shiro Kawai (24 Jul 2005 08:15 UTC)
Re: Surrogates and character representation Tom Emerson (24 Jul 2005 13:25 UTC)
Re: Surrogates and character representation Alan Watson (24 Jul 2005 17:32 UTC)
Re: Surrogates and character representation Tom Emerson (24 Jul 2005 17:54 UTC)
Re: Surrogates and character representation Alan Watson (24 Jul 2005 18:15 UTC)
Re: Surrogates and character representation Tom Emerson (24 Jul 2005 20:18 UTC)
Re: Surrogates and character representation Per Bothner (24 Jul 2005 18:25 UTC)
Re: Surrogates and character representation John.Cowan (24 Jul 2005 23:02 UTC)
Re: Surrogates and character representation Per Bothner (24 Jul 2005 23:26 UTC)
Re: Surrogates and character representation Alan Watson (25 Jul 2005 17:24 UTC)
Re: Surrogates and character representation bear (27 Jul 2005 16:16 UTC)
Re: Surrogates and character representation John.Cowan (24 Jul 2005 22:12 UTC)
Re: Surrogates and character representation Ken Dickey (24 Jul 2005 09:35 UTC)
Re: Surrogates and character representation Michael Sperber (24 Jul 2005 11:47 UTC)
Re: the "Unicode Background" section Matthew Flatt (22 Jul 2005 04:30 UTC)
Re: the "Unicode Background" section Alex Shinn (22 Jul 2005 05:42 UTC)
Re: the "Unicode Background" section bear (22 Jul 2005 15:45 UTC)
Re: the "Unicode Background" section Tom Emerson (22 Jul 2005 15:56 UTC)

Re: Surrogates and character representation John.Cowan 22 Jul 2005 04:09 UTC

Tom Emerson scripsit:

> If you treat the surrogates as undefined within the character range,
> then you must (for consistency) treat all of the other undefined
> abstract characters as holes. This just complicates processing.

All other undefined codepoints are potentially definable: they correspond
to Unicode scalar values.  Surrogate codepoints are not definable and
don't correspond to any Unicode scalar value.  The difference is
architectural.

> One question I've had: how are 8-bit (i.e., byte) strings handled
> here? Is there no distinction between operations on raw bytes and
> operations on characters?

Those things are not strings: they are vectors of unsigned 8-bit integers.

--
John Cowan      xxxxxx@reutershealth.com        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?