A simple-minded approach is copy the string body in any mutation.
Which is what Gauche does---(string-set! foo ...) actually copies 1000000
character string content so the sharing is broken at that point.
I'm betting that in future string-set! will fade out and it won't be an issue.

One  possible trick is to flag the string body if it is ever shared.
It works like 1-bit reference counting; if it is flagged, it *may* be shared
so we copy.  Assumption is that majority of strings, especially transient
ones, aren't shared at all so copying is avoided.

Of course, if we use rope-like structures the amount needed to copy
will reduce.

The cost of mutation and substring with sharing is a trade-off.  An implimentation
may choose its optimization strategy, but usually you'll optimize for typical cases.
But in this discussion, actually the issue is that which style we want to promote---
if more libraries consistently use substrings and discourage mutations, the more
likely that implementations will eventually optimize for it.

The span approach allows to experiment string representation strategy in Scheme
layer without requiring touching the underlying string representation.  Which is
technically good, but I'm afraid that it leaves too much cruft in future.  On some
implementations a span may be just a string because substring is efficient.
But portable code must use spans, which has mostly duplicated operations
with strings?   Having multiple string-like thingies in one language has long
repercussions; in C++ I still deal with multiple string classes, Python 2 with
ascii and unicode strings dies hard.  I'd like to avoid that kind of situation.


On Sat, Dec 5, 2015 at 6:51 AM, John Cowan <xxxxxx@mercury.ccil.org> wrote:
Shiro Kawai scripsit:

> Gauche's GC doesn't move objects, but for copying GCs, one way is to keep
> the pointer to the head of the original string and the offset.

Sorry, I wasn't clear.  Consider this:

gosh> (define foo (make-string 1000000 #\z))
foo
gosh> (define bar (substring foo 1 999998)) ; presumably shares with foo
bar
gosh> (begin (string-set! foo 500000 #\Z) 42) ; suppresses lengthy output of string-set!
42
gosh> (string-ref bar 499999)
#\z

Now when the string-set! was executed, the sharing between foo and bar
strings had to be broken somehow, because we see that mutating foo did
not affect bar.  How does that work?

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
But the next day there came no dawn, and the Grey Company passed on
into the darkness of the Storm of Mordor and were lost to mortal sight;
but the Dead followed them.          --"The Passing of the Grey Company"