Re: Surrogates and character representation

Show/hide message thread

Re: the "Unicode Background" section Thomas Lord (22 Jul 2005 03:28 UTC)

Surrogates and character representation Tom Emerson (22 Jul 2005 03:55 UTC)

Re: Surrogates and character representation John.Cowan (22 Jul 2005 04:09 UTC)

Re: Surrogates and character representation Tom Emerson (22 Jul 2005 04:26 UTC)

Re: Surrogates and character representation Thomas Bushnell BSG (23 Jul 2005 07:19 UTC)

Re: Surrogates and character representation Tom Emerson (23 Jul 2005 17:38 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 05:37 UTC)

Re: Surrogates and character representation Shiro Kawai (24 Jul 2005 08:15 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 13:25 UTC)

Re: Surrogates and character representation Alan Watson (24 Jul 2005 17:32 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 17:54 UTC)

Re: Surrogates and character representation Alan Watson (24 Jul 2005 18:15 UTC)

Re: Surrogates and character representation Tom Emerson (24 Jul 2005 20:18 UTC)

Re: Surrogates and character representation Per Bothner (24 Jul 2005 18:25 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 23:02 UTC)

Re: Surrogates and character representation Per Bothner (24 Jul 2005 23:26 UTC)

Re: Surrogates and character representation Alan Watson (25 Jul 2005 17:24 UTC)

Re: Surrogates and character representation bear (27 Jul 2005 16:16 UTC)

Re: Surrogates and character representation John.Cowan (24 Jul 2005 22:12 UTC)

Re: Surrogates and character representation Ken Dickey (24 Jul 2005 09:35 UTC)

Re: Surrogates and character representation Michael Sperber (24 Jul 2005 11:47 UTC)

Re: the "Unicode Background" section Matthew Flatt (22 Jul 2005 04:30 UTC)

Re: the "Unicode Background" section Alex Shinn (22 Jul 2005 05:42 UTC)

Re: the "Unicode Background" section bear (22 Jul 2005 15:45 UTC)

Re: the "Unicode Background" section Tom Emerson (22 Jul 2005 15:56 UTC)

Re: Surrogates and character representation Per Bothner 24 Jul 2005 23:26 UTC

John.Cowan wrote:
> Per Bothner scripsit:
>
>
>>It's the other way round.  Using UTF-8 as in internal representation is
>>just fine for *applications*.  The problem is that certain *API*s have a
>>concept of indexing into a string, and unfortunately R5RS is one of
>>them.  In itself indexing of strings is a useless feature, as it can be
>>replaced by a sequential-access cursor/iterator API - but historically
>>the Scheme cursor/iterator API uses integers for the "cursor".  And
>>existing code moves the "cursor" forwards by adding 1.
>
>
> By the same token, random-access disks are a useless feature, for they
> can be replaced by sequential-access DECtapes that can be rewound and
> selectively rewritten.  But at a price.

You're misunderstanding my point, perhaps because I was unclear.  There
are very few applications where you want to "getting the N'th record of
file", in the sense the N is semantically meaningful.  There are lots of
applications where you want to get to a record fast, using random-access
given a "cookie": i.e. some way that the implementation can efficiently
map the cookie into the disk location of the record.  The cookie may be
the disk address of the record, or its offset in a file, which may not
have any direct relationship to N, especially if you have
variable-length records.

Similarly, it is often useful to have random access in a long string,
perhaps one representing an emacs buffer.  However, you want to
efficiently access sub-strings, not characters.  Furthermore, you're
interested in substrings defined in terms of previously-seen positions -
or "marks" in the Emacs sense, not character indexes.  E.g. the
substring matching a regexp.

Specifically, can you think of any application where this suggestion
would lead to performance problems:
http://srfi.schemers.org/srfi-75/mail-archive/msg00050.html
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/