John Cowan wrote: > > Yes, texts would be a new and disjoint data type. That's why I called > > them texts instead of strings. > > So not merely disjoint, but separately built from the ground up. My > spans were layered on top of strings. That may be how you thought of them, but I think history has shown that explaining them as such was a marketing mistake. The winning implementations of spans built them from scratch. To make it possible for Alex and others to follow our little discussion here, here are comments extracted from the four implementations of spans I built last year, starting with the best representation: Representation d: ;;; The representation-dependent part of an implementation of ;;; character spans that represents character spans by records ;;; encapsulating a UTF-8 representation (as a bytevector) along ;;; with a vector mapping character indexes to bytevector indexes, ;;; the character length of the span (not the bytevector), and ;;; substring bounds (as bytevector indexes). Cursors are exact ;;; integers. This implementation differs from spans.body1c.scm ;;; because cursors are indexes into the base bytevector, not ;;; artificially biased offsets from the beginning of the first ;;; character in the span. ;;; ;;; This implementation is suitable for systems that support bytevectors, ;;; Unicode characters, and Unicode strings. It provides O(1) indexing ;;; of spans even if string-ref does not run in constant time. Representation c: ;;; The representation-dependent part of an implementation of ;;; character spans that represents character spans by records ;;; encapsulating a UTF-8 representation (as a bytevector) along ;;; with a vector mapping character indexes to bytevector indexes, ;;; the character length of the span (not the bytevector), and ;;; substring bounds (as bytevector indexes). Cursors are exact ;;; integers. ;;; ;;; This implementation is suitable for systems that support bytevectors, ;;; Unicode characters, and Unicode strings. It provides O(1) indexing ;;; of spans even if string-ref does not run in constant time. Representation b: ;;; The representation-dependent part of an implementation of ;;; character spans that represents character spans by records ;;; encapsulating a string and substring bounds and represents ;;; cursors by exact integers. ;;; ;;; This implementation is suitable for systems in which ;;; string-ref runs in constant time. Representation a: ;;; The representation-dependent part of an implementation of ;;; character spans that represents character spans by strings ;;; and cursors by exact integers. ;;; ;;; This implementation might be suitable for systems that ;;; don't allow non-Ascii characters in strings, don't support ;;; bytevectors, and don't much care about performance. ;;; FIXME: So far as I can see, nothing in the specification ;;; of character spans precludes their representation as strings. ;;; To provide a small degree of sanity checking, however, this ;;; implementation uses an uncommon Ascii character to tag the ;;; strings that represent character spans. As you can see from that FIXME comment, strings and spans were effectively distinct in all four of those implementations. Furthermore string cursors, string indexes, and span cursors were three distinct types in all of those representations, even though all were represented by exact integers. Using a fairly large negative offset for string cursors and a very different offset for span cursors made it easy for even cursory testing to find inappropriate uses. > > Once the community understands the advantages of immutable texts > > with O(1) random access via character indexes, and how well they > > interoperate with mutable strings (because you can use the same > > indexes into each), I think the community would be willing to allow > > someone like you or me to write a new SRFI specifying immutable texts. > > Maybe. It won't be me: I have other plans, as noted in my previous > email. In that case, may I have your permission to use your spans proposal and SRFI 130 as guides to the operations that should be provided by that hypothetical SRFI specifying immutable texts? > > SRFI 130 says it's "an error unless start and end are both cursors or > > both indexes", but SRFI 130 doesn't say the four optional parameters > > of (say) string-prefix-length have to be all cursors or all indexes. > > That's an oversight I will fix. That would be great. Thanks. Will