Re: Surrogates and character representation
Per Bothner 24 Jul 2005 23:26 UTC
John.Cowan wrote:
> Per Bothner scripsit:
>
>
>>It's the other way round. Using UTF-8 as in internal representation is
>>just fine for *applications*. The problem is that certain *API*s have a
>>concept of indexing into a string, and unfortunately R5RS is one of
>>them. In itself indexing of strings is a useless feature, as it can be
>>replaced by a sequential-access cursor/iterator API - but historically
>>the Scheme cursor/iterator API uses integers for the "cursor". And
>>existing code moves the "cursor" forwards by adding 1.
>
>
> By the same token, random-access disks are a useless feature, for they
> can be replaced by sequential-access DECtapes that can be rewound and
> selectively rewritten. But at a price.
You're misunderstanding my point, perhaps because I was unclear. There
are very few applications where you want to "getting the N'th record of
file", in the sense the N is semantically meaningful. There are lots of
applications where you want to get to a record fast, using random-access
given a "cookie": i.e. some way that the implementation can efficiently
map the cookie into the disk location of the record. The cookie may be
the disk address of the record, or its offset in a file, which may not
have any direct relationship to N, especially if you have
variable-length records.
Similarly, it is often useful to have random access in a long string,
perhaps one representing an emacs buffer. However, you want to
efficiently access sub-strings, not characters. Furthermore, you're
interested in substrings defined in terms of previously-seen positions -
or "marks" in the Emacs sense, not character indexes. E.g. the
substring matching a regexp.
Specifically, can you think of any application where this suggestion
would lead to performance problems:
http://srfi.schemers.org/srfi-75/mail-archive/msg00050.html
--
--Per Bothner
xxxxxx@bothner.com http://per.bothner.com/