Re: Surrogates and character representation

Show/hide message thread

Re: the "Unicode Background" section Thomas Lord (22 Jul 2005 03:28 UTC)

Surrogates and character representation Tom Emerson (22 Jul 2005 03:55 UTC)

Re: Surrogates and character representation John.Cowan (22 Jul 2005 04:09 UTC)

Re: Surrogates and character representation Tom Emerson (22 Jul 2005 04:26 UTC)

Re: Surrogates and character representation Thomas Bushnell BSG (23 Jul 2005 07:19 UTC)

Re: Surrogates and character representation Tom Emerson (23 Jul 2005 17:38 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 05:37 UTC)

Re: Surrogates and character representation Shiro Kawai (24 Jul 2005 08:15 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 13:25 UTC)

Re: Surrogates and character representation Alan Watson (24 Jul 2005 17:32 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 17:54 UTC)

Re: Surrogates and character representation Alan Watson (24 Jul 2005 18:15 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 20:18 UTC)

Re: Surrogates and character representation Per Bothner (24 Jul 2005 18:25 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 23:02 UTC)

Re: Surrogates and character representation Per Bothner (24 Jul 2005 23:26 UTC)

Re: Surrogates and character representation Alan Watson (25 Jul 2005 17:24 UTC)

Re: Surrogates and character representation bear (27 Jul 2005 16:16 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 22:12 UTC)

Re: Surrogates and character representation Ken Dickey (24 Jul 2005 09:35 UTC)

Re: Surrogates and character representation Michael Sperber (24 Jul 2005 11:47 UTC)

Re: the "Unicode Background" section Matthew Flatt (22 Jul 2005 04:30 UTC)

Re: the "Unicode Background" section Alex Shinn (22 Jul 2005 05:42 UTC)

Re: the "Unicode Background" section bear (22 Jul 2005 15:45 UTC)

Re: the "Unicode Background" section Tom Emerson (22 Jul 2005 15:56 UTC)

Re: Surrogates and character representation Per Bothner 24 Jul 2005 18:25 UTC

Tom Emerson wrote:
> Representing strings internally in UTF-8 is a loss though, since you
> lose random access to the string.

Random access to a previously accessed position works just fine - just
use the byte offset.

Random accesses to a position in a string that has not been previously
accessed is not in itself useful.

> For some applications this isn't a big deal, but in general using UTF-8
 > as an internal representation is a bad idea.

It's the other way round.  Using UTF-8 as in internal representation is
just fine for *applications*.  The problem is that certain *API*s have a
concept of indexing into a string, and unfortunately R5RS is one of
them.  In itself indexing of strings is a useless feature, as it can be
replaced by a sequential-access cursor/iterator API - but historically
the Scheme cursor/iterator API uses integers for the "cursor".  And
existing code moves the "cursor" forwards by adding 1.
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/