Gauche's string is immutable internally.   Substring shares the string
body (except the case that you take out a small portion from a long string;
in that case we copy that portion to avoid retaining entire string
unnecessarily)

String mutation is "emulated" with an extra indirection (create a new
immutable string, then swap the pointer), only to comply RnRS.  

With preemptive threads and multibyte character representation internally,
I don't know an efficient way to realize mutable strings; that is, to guarantee
other threads never see intermediate state of mutation, without requiring
mutex.

I do see the motivation behind this srfi, but I can't help feeling that it's
building a superfluous feature to cover up a fundamental issue.

The srfi does state this is for moving away from mutable strings.  But
once we have another set of API, it'll stuck; we won't be able to
drop it after we moved to immutable strings, because of the backward
compatibility, and we'll have almost duplicate set of apis on the same
kind of objects.






On Fri, Dec 4, 2015 at 5:53 AM, John Cowan <xxxxxx@mercury.ccil.org> wrote:
Per Bothner scripsit:

> I really dislike the habit of having separate names for different functions
> that abstractly do the same thing - I think it is very user-unfriendly.
> If a "span" is a sequence of characters, call it what it is: a string.

But "string" already means something in Scheme: a *mutable* sequence
of characters addressable by indexes (such that adding 1 to the index
gets the next character and subtracting 1 gets the previous character
modulo boundary cases).  I'd be happy to rename these functions to
"istring-" on the analogy of SRFI 117 ilists and forthcoming ideques,
isets, and imaps.  Spans are immutable primarily so that they can share
storage with other spans, which is not true of strings as actually
implemented (except on Guile, which has copy-on-write strings).

However, I don't see how you can avoid having separate functions
(or polymorphic functions such that cursors cannot be exact integers)
for the other span operations, which means that instead of span-ref or
istring-ref, you have string-ref/cursors: is that really an improvement?

> (1) a portable implementation that matches more-or-less the current proposal

Will's code actually provides for four implementations, to which I plan
to add a fifth:

(a) spans are strings (no sharing, poor performance)

(b) spans are records with a string and two indexes
    (performs well if strings are simple arrays of characters)

(c) spans are records with a bytevector representing UTF-8 and two
    bytevector indexes (generally best performance/safety tradeoff)

(d) same as (c) but without sanity checks, so performs a bit better

(e) spans are records with a string whose characters represent individual
    UTF-8 bytes via the natural mapping and two bytevector indexes
    (suitable for systems like Chicken where strings are single-byte)

> (2) a portable library that wraps (1) *and* native strings.

It's easy to provide that yourself, given (1).  Scheme is relentlessly
monomorphic[*], partly so as not to hard-code specific type relationships
into the libraries as most languages do.  If you want a library like the
CL sequence library that hides the difference between lists and vectors,
you can easily have one, but the language standards don't compel such
a relationship to exist.

[*] We do have universal case-by-case polymorphism for things like `write`,
and generic arithmetic is polymorphic over exact and inexact numbers.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
Mr. Henry James writes fiction as if it were a painful duty.  --Oscar Wilde