Re: constant-time access to variable-width encodings

Show/hide message thread

constant-time access to variable-width encodings Per Bothner (13 Jul 2005 18:13 UTC)

Re: constant-time access to variable-width encodings Ray Blaak (13 Jul 2005 18:48 UTC)

Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 20:16 UTC)

Re: constant-time access to variable-width encodings Per Bothner (13 Jul 2005 20:36 UTC)

Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 23:07 UTC)

Re: constant-time access to variable-width encodings bear (14 Jul 2005 00:23 UTC)

Re: constant-time access to variable-width encodings Per Bothner (14 Jul 2005 00:39 UTC)

Re: constant-time access to variable-width encodings bear (14 Jul 2005 01:52 UTC)

Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:18 UTC)

Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:16 UTC)

Re: constant-time access to variable-width encodings Shiro Kawai 13 Jul 2005 20:15 UTC

>From: Per Bothner <xxxxxx@bothner.com>
Subject: constant-time access to variable-width encodings
Date: Wed, 13 Jul 2005 11:12:57 -0700

> The proposal is to allow string-ref to return #\partial for some indexes
> representing non-initial bytes or low-surrogate values.

Interesting proposal, and I agree with the need of length-changing
mutation (see my other post).

I feel a bit uncomfortable, though, with the fact that indexes and
string-length differ among different implementations, or even in the
same implementations with different character encodings.  It makes
a datastructure that holds a string and its indexes non-portable,
for example.

I'd agree the proposal if it introduces a different means of
indexing, other than character count used for string-ref.  Call it
'offset' for now.  string-offset-ref, substring-offset etc. would
provide offset-based operation, while string-ref, substring etc.
work on character-based op.  Though it may be too cumbersome for
core language.  And this is too much variable-length-character centric
API, which fixed-length character implementation or other
implementations (such as tree of segments) wouldn't care much.

--shiro