|
the discussion so far
Matthew Flatt
(16 Jul 2005 12:41 UTC)
|
||
|
(missing)
|
||
|
(missing)
|
||
|
(missing)
|
||
|
Re: the discussion so far
bear
(20 Jul 2005 02:45 UTC)
|
||
|
Re: the discussion so far John.Cowan (20 Jul 2005 03:56 UTC)
|
||
|
(missing)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 02:50 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 02:56 UTC)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:15 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:24 UTC)
|
||
|
Re: the discussion so far
Alex Shinn
(20 Jul 2005 03:38 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 03:49 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:24 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 04:27 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(20 Jul 2005 04:58 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 05:04 UTC)
|
||
|
Re: the discussion so far
Jorgen Schaefer
(16 Jul 2005 13:05 UTC)
|
||
|
Re: the discussion so far
Matthew Flatt
(16 Jul 2005 13:21 UTC)
|
||
|
Re: the discussion so far
Jorgen Schaefer
(16 Jul 2005 13:58 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:42 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:57 UTC)
|
||
|
Re: the discussion so far
Jorgen Schaefer
(17 Jul 2005 03:33 UTC)
|
||
|
Re: the discussion so far
bear
(16 Jul 2005 18:07 UTC)
|
||
|
Re: the discussion so far
John.Cowan
(17 Jul 2005 04:49 UTC)
|
||
|
Re: the discussion so far
Thomas Bushnell BSG
(17 Jul 2005 02:40 UTC)
|
||
bear scripsit:
> The particular example I'm thinking of is splitting strings
> between base codepoint and combining codepoint. You get two
> substrings, and the second one is syntactically invalid.
Please point to a place in the Unicode Standard where any sequence
of Unicode scalar values is said to be "syntactically invalid".
> If you print the first substring and then the second, the
> combining codepoint is usually printed as though it modified
> a space character that isn't actually there.
That's one possibility; it can also be rendered on top of
a dotted-circle, which is what is done in the Unicode charts.
In any case, glyph rendering is not part of the Standard.
> If something
> normalizes the substrings first, the space may actually be
> added, although it wasn't present in the original string.
That turns out not to be the case. The normalized form of
a string consisting of one combining character is itself.
> Gah. Encodings, normalization forms, endianness, and all the
> rest of it. When you want to write a "character" any of a dozen
> things can happen.
Blurring significant distinctions that have taken a long time to
nail down isn't very conducive to clear thinking.
--
Not to perambulate John Cowan <xxxxxx@reutershealth.com>
the corridors http://www.reutershealth.com
during the hours of repose http://www.ccil.org/~cowan
in the boots of ascension. --Sign in Austrian ski-resort hotel