Re: New draft of SRFI 130: Cursor-based string library

Show/hide message thread

New draft of SRFI 130: Cursor-based string library Arthur A. Gleckler (14 May 2016 16:07 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (14 May 2016 22:44 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 06:53 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (21 May 2016 16:38 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (21 May 2016 17:01 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 17:36 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (22 May 2016 04:23 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 17:23 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (22 May 2016 06:38 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (23 May 2016 02:49 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 03:50 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 04:30 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (23 May 2016 04:56 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 13:19 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 15:45 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 16:52 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 18:01 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 20:32 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan 23 May 2016 13:19 UTC

William D Clinger scripsit:

> It's just a matter of pre-computing an index table that maps every
> character index that's a multiple of (for example) 80 to the
> corresponding bytevector index of a UTF-8 representation.

I don't see how this can be plausibly implemented on top of a Scheme's
conventional strings: they would need to be used with an altogether
disjoint data type, and providing both strings and spans was immediately
rejected by the community.  The tables would have to be created and
garbage collected under the covers at appropriate times, and it's not
clear when those are, nor how to maintain the necessary relationship
given that most Schemes don't have weak tables.

> It isn't as fast as using four bytes per character,

Alternatively, one can use single-byte strings for the Latin-1
repertoire, double-byte for the BMP repertoire, and quadruple-byte for
the full repertoire.  This is indexable very quickly.  The only trouble
with it is that it's hard to interchange strings with the surrounding C
or Java or C# environment, which is why people have favored a UTF lately
despite its inefficiency for pure Scheme operations.

> I'll note than none of the sample implementations for SRFI 130
> actually work if cursors are distinct from indexes,

The foof implementation is straight from Chibi with the addition of
foof-shim, which provides cursors-as-indexes.  If you take that out and
provide native cursors (which in Chibi are specially tagged immediates)
it does work.

> the basic operations accept both indexes and cursors, often
> as optional arguments so there's a combinatorial explosion of
> possibilities,

It's an error to mix cursors and indexes in the same call, so there are
really only two possibilities.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
We pledge allegiance to the penguin and to the intellectual property
regime for which he stands, one world under Linux, with free music
and open source software for all.  --Julian Dibbell on Brazil, edited