Issues with Unicode Jonathan S. Shapiro (23 Apr 2006 08:55 UTC)
Re: Issues with Unicode Marc Feeley (23 Apr 2006 13:26 UTC)
Re: Issues with Unicode Shiro Kawai (26 Apr 2006 06:27 UTC)
Re: Issues with Unicode Taylor R. Campbell (26 Apr 2006 07:50 UTC)
Re: Issues with Unicode Shiro Kawai (26 Apr 2006 22:21 UTC)
Re: Issues with Unicode Jorgen Schaefer (26 Apr 2006 23:40 UTC)

Re: Issues with Unicode Shiro Kawai 26 Apr 2006 06:27 UTC

>From: "Jonathan S. Shapiro" <xxxxxx@eros-os.org>
Subject: Issues with Unicode
Date: Sun, 23 Apr 2006 10:54:55 +0200

> 11. Strings now, more than ever, are not just vectors of characters
> (though this should be a feasible implementation). There is *excellent*
> discussion of the issues in the libicu documentation, and I strongly
> recommend reading that.

Alternative implementations of strings have been discussed in
this list, and some threads in comp.lang.scheme, I think.
I'd like to draw attention to one point which hasn't been
raised, IIRC.  (Maybe it is too trivial and everybody knows
about it; if so, sorry for the noise.)

Some of the fancier implementations might not go well with
preemptive multithreads; if mutation of string touches more
than one place of the string objects, it creates a hazard.

Generally it is unacceptable to lock at every string access, so
the practical solution is to split a string structure to a
"header" and a mutable body.  If you want to change the body
of a string in an unsafe way , you allocate a fresh body,
set it up with desired modifications, and swap the pointer
in the "header" to the new body.   As far as pointer assignment
is atomic, this is safe.

Although this workaround is trivial, I cannot help thinking
how much having string-set! is worth.  This workaround is
almost like we have an immutable string (body), and emulating
mutable strings by the header.  Wouldn't it be more natural to
have strings immutable, and separate object to construct a
string?  For sequential construction we have string ports;
for random construction, it can be a vector, or we could have
vector-of-characters in a spirit of srfi-4, and convert it to
immutable string once we finish building it.

(It might be convenient to have mutable strings for editor-like
applications; which also allow length-changing mutation.  I'd
rather think it to be another type of object that can be built
on top of immutable strings; e.g. a buffer object realized by
a balanced tree of string segments).

--shiro