Re: Issues with Unicode bear 10 May 2006 17:53 UTC

On Wed, 10 May 2006, John Cowan wrote:

>bear scripsit:
>> Immutable strings - With Unicode and threads, it's the only viable
>> implementation strategy.  [...]  Once you've done the legwork for
>> immutable strings, providing string-set!  and similar is a very short
>> further trip.
>Part of the contract for string-set! is that it mutates its
>first argument.

Right.  The "short further trip" of course is to provide
some kind of string-head which is effectively like
half a cons.  Then string-set! can be implemented by
creating the new string-body using the functional
underpinnings, and updating the string-head to point at
the new string-body.  Now you just use the string-head
as your string representation, and you've provided
string-set!, with its current contract, limiting the
mutation to a single point.

>> Removing string-set! would be way too much of a flag-day for
>> existing scheme code.
>Can't have it both ways.  It will also be a flag day
>to replace string-set! with string-update or some similar
>functional equivalent.

Didn't want it both ways.  String-set!, with unchanged contract,
can be implemented on top of purely functional methods for
manipulating string bodies and an atomic single mutation for
manipulating the string head.

>> Regarding what ought to be legal as an identifier: I think
>> control characters, whitespace (properties Zs, Zl, Zp) and
>> delimiters (properties Ps, and Pc) ought not appear in
>> identifiers.  I wouldn't be at all upset if a standard also
>> forbade combining characters; after all, identifiers and
>> symbol names don't need the full functionality of strings.
>In tht cs thy cnnt be ntrl-lngg trms in an of the lrg set
>of lnggs tht us cmbnng chrctrs fr vwls.

Okay, good point.  And a good reason to allow combining
characters in identifiers, I'm not upset either way.

>> I wouldn't be at all upset of a standard also forbade all
>> characters not yet assigned as of Unicode 4.1.0, with
>> the implication that this forbidding would be permanent
>> across Scheme report revisions, even though later Unicode
>> versions doubtless will come along.
>Which amounts to saying that programmers who use some
>languages get to use meaningful identifiers and others don't.
>That's manifestly unfair.

Hah?  Unicode already encompasses, I believe, every living
language with a writing system.  If you mean that there are
programmers who can't get meaningful identifiers using the
character set defined as of Unicode 4.1.0, I want to know
who those programmers are.

Meanwhile, allowing identifier syntax to shift with every
version of Unicode creates the potential for version