the discussion so far Matthew Flatt (16 Jul 2005 12:41 UTC)
(missing)
(missing)
Re: the discussion so far Alex Shinn (20 Jul 2005 02:50 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 02:56 UTC)
Re: the discussion so far Alex Shinn (20 Jul 2005 03:15 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 03:24 UTC)
Re: the discussion so far Alex Shinn (20 Jul 2005 03:38 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 03:49 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 04:24 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 04:27 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 04:58 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 05:04 UTC)
(missing)
(missing)
Re: the discussion so far bear (20 Jul 2005 02:45 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 03:56 UTC)
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:05 UTC)
Re: the discussion so far Matthew Flatt (16 Jul 2005 13:21 UTC)
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:58 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:42 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:57 UTC)
Re: the discussion so far Jorgen Schaefer (17 Jul 2005 03:33 UTC)
Re: the discussion so far bear (16 Jul 2005 18:07 UTC)
Re: the discussion so far John.Cowan (17 Jul 2005 04:49 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:40 UTC)

Re: the discussion so far John.Cowan 20 Jul 2005 03:55 UTC

bear scripsit:

> The particular example I'm thinking of is splitting strings
> between base codepoint and combining codepoint. You get two
> substrings, and the second one is syntactically invalid.

Please point to a place in the Unicode Standard where any sequence
of Unicode scalar values is said to be "syntactically invalid".

> If you print the first substring and then the second, the
> combining codepoint is usually printed as though it modified
> a space character that isn't actually there.

That's one possibility; it can also be rendered on top of
a dotted-circle, which is what is done in the Unicode charts.
In any case, glyph rendering is not part of the Standard.

> If something
> normalizes the substrings first, the space may actually be
> added, although it wasn't present in the original string.

That turns out not to be the case.  The normalized form of
a string consisting of one combining character is itself.

> Gah.  Encodings, normalization forms, endianness, and all the
> rest of it.  When you want to write a "character" any of a dozen
> things can happen.

Blurring significant distinctions that have taken a long time to
nail down isn't very conducive to clear thinking.

--
Not to perambulate                 John Cowan <xxxxxx@reutershealth.com>
    the corridors                  http://www.reutershealth.com
during the hours of repose         http://www.ccil.org/~cowan
    in the boots of ascension.       --Sign in Austrian ski-resort hotel