Re: constant-time access to variable-width encodings
Shiro Kawai 13 Jul 2005 20:15 UTC
>From: Per Bothner <xxxxxx@bothner.com>
Subject: constant-time access to variable-width encodings
Date: Wed, 13 Jul 2005 11:12:57 -0700
> The proposal is to allow string-ref to return #\partial for some indexes
> representing non-initial bytes or low-surrogate values.
Interesting proposal, and I agree with the need of length-changing
mutation (see my other post).
I feel a bit uncomfortable, though, with the fact that indexes and
string-length differ among different implementations, or even in the
same implementations with different character encodings. It makes
a datastructure that holds a string and its indexes non-portable,
for example.
I'd agree the proposal if it introduces a different means of
indexing, other than character count used for string-ref. Call it
'offset' for now. string-offset-ref, substring-offset etc. would
provide offset-based operation, while string-ref, substring etc.
work on character-based op. Though it may be too cumbersome for
core language. And this is too much variable-length-character centric
API, which fixed-length character implementation or other
implementations (such as tree of segments) wouldn't care much.
--shiro