Re: New draft of SRFI 130: Cursor-based string library

Show/hide message thread

New draft of SRFI 130: Cursor-based string library Arthur A. Gleckler (14 May 2016 16:07 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (14 May 2016 22:44 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 06:53 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (21 May 2016 16:38 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (21 May 2016 17:01 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 17:36 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (22 May 2016 04:23 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (21 May 2016 17:23 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (22 May 2016 06:38 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (23 May 2016 02:49 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 03:50 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 04:30 UTC)

Re: New draft of SRFI 130: Cursor-based string library Alex Shinn (23 May 2016 04:56 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 13:19 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 15:45 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 16:52 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger (23 May 2016 18:01 UTC)

Re: New draft of SRFI 130: Cursor-based string library John Cowan (23 May 2016 20:32 UTC)

Re: New draft of SRFI 130: Cursor-based string library William D Clinger 23 May 2016 18:01 UTC

John Cowan wrote:

> > Yes, texts would be a new and disjoint data type.  That's why I called
> > them texts instead of strings.
>
> So not merely disjoint, but separately built from the ground up.  My
> spans were layered on top of strings.

That may be how you thought of them, but I think history has shown
that explaining them as such was a marketing mistake.  The winning
implementations of spans built them from scratch.

To make it possible for Alex and others to follow our little discussion
here, here are comments extracted from the four implementations of spans
I built last year, starting with the best representation:

Representation d:

;;; The representation-dependent part of an implementation of
;;; character spans that represents character spans by records
;;; encapsulating a UTF-8 representation (as a bytevector) along
;;; with a vector mapping character indexes to bytevector indexes,
;;; the character length of the span (not the bytevector), and
;;; substring bounds (as bytevector indexes).  Cursors are exact
;;; integers.  This implementation differs from spans.body1c.scm
;;; because cursors are indexes into the base bytevector, not
;;; artificially biased offsets from the beginning of the first
;;; character in the span.
;;;
;;; This implementation is suitable for systems that support bytevectors,
;;; Unicode characters, and Unicode strings.  It provides O(1) indexing
;;; of spans even if string-ref does not run in constant time.

Representation c:

;;; The representation-dependent part of an implementation of
;;; character spans that represents character spans by records
;;; encapsulating a UTF-8 representation (as a bytevector) along
;;; with a vector mapping character indexes to bytevector indexes,
;;; the character length of the span (not the bytevector), and
;;; substring bounds (as bytevector indexes).  Cursors are exact
;;; integers.
;;;
;;; This implementation is suitable for systems that support bytevectors,
;;; Unicode characters, and Unicode strings.  It provides O(1) indexing
;;; of spans even if string-ref does not run in constant time.

Representation b:

;;; The representation-dependent part of an implementation of
;;; character spans that represents character spans by records
;;; encapsulating a string and substring bounds and represents
;;; cursors by exact integers.
;;;
;;; This implementation is suitable for systems in which
;;; string-ref runs in constant time.

Representation a:

;;; The representation-dependent part of an implementation of
;;; character spans that represents character spans by strings
;;; and cursors by exact integers.
;;;
;;; This implementation might be suitable for systems that
;;; don't allow non-Ascii characters in strings, don't support
;;; bytevectors, and don't much care about performance.

;;; FIXME:  So far as I can see, nothing in the specification
;;; of character spans precludes their representation as strings.
;;; To provide a small degree of sanity checking, however, this
;;; implementation uses an uncommon Ascii character to tag the
;;; strings that represent character spans.

As you can see from that FIXME comment, strings and spans were
effectively distinct in all four of those implementations.

Furthermore string cursors, string indexes, and span cursors
were three distinct types in all of those representations, even
though all were represented by exact integers.  Using a fairly
large negative offset for string cursors and a very different
offset for span cursors made it easy for even cursory testing
to find inappropriate uses.

> > Once the community understands the advantages of immutable texts
> > with O(1) random access via character indexes, and how well they
> > interoperate with mutable strings (because you can use the same
> > indexes into each), I think the community would be willing to allow
> > someone like you or me to write a new SRFI specifying immutable texts.
>
> Maybe.  It won't be me: I have other plans, as noted in my previous
> email.

In that case, may I have your permission to use your spans proposal
and SRFI 130 as guides to the operations that should be provided by
that hypothetical SRFI specifying immutable texts?

> > SRFI 130 says it's "an error unless start and end are both cursors or
> > both indexes", but SRFI 130 doesn't say the four optional parameters
> > of (say) string-prefix-length have to be all cursors or all indexes.
>
> That's an oversight I will fix.

That would be great.  Thanks.

Will