the discussion so far Matthew Flatt (16 Jul 2005 12:41 UTC)
(missing)
(missing)
Re: the discussion so far Alex Shinn (20 Jul 2005 02:50 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 02:56 UTC)
Re: the discussion so far Alex Shinn (20 Jul 2005 03:15 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 03:24 UTC)
Re: the discussion so far Alex Shinn (20 Jul 2005 03:38 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 03:49 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 04:24 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 04:27 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 04:58 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 05:04 UTC)
(missing)
(missing)
Re: the discussion so far bear (20 Jul 2005 02:45 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 03:56 UTC)
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:05 UTC)
Re: the discussion so far Matthew Flatt (16 Jul 2005 13:21 UTC)
Re: the discussion so far Jorgen Schaefer (16 Jul 2005 13:58 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:42 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:57 UTC)
Re: the discussion so far Jorgen Schaefer (17 Jul 2005 03:33 UTC)
Re: the discussion so far bear (16 Jul 2005 18:07 UTC)
Re: the discussion so far John.Cowan (17 Jul 2005 04:49 UTC)
Re: the discussion so far Thomas Bushnell BSG (17 Jul 2005 02:40 UTC)

the discussion so far Matthew Flatt 16 Jul 2005 12:41 UTC

Thanks to everyone who has contributed to this discussion. It has moved
so quickly that I have little hope of responding to everything, but
I've found many of the comments to be helpful.

The biggest piece of implicit feedback is that the SRFI does not really
make the editors' goal clear. The goal is not to finally get strings
"right", or even to be Unicode-compliant. The goal is simply to make
Scheme programs more portable.

My impression of the editors: we're not going to standardize anything
less than a specific set of characters. There is a consensus that the
current weak standard causes too many portability problems, and that
the solution is to pin down precisely the meaning of "character".

Meanwhile, implementations and libraries are certainly free ---
encouraged, even --- to define other datatypes and other operations on
"character" and "string". Those other datatypes and operations will be
better than the standard (otherwise there would have been no point for
the implementor), and as experience develops, something will likely
replace whatever appears in R6RS.

For an R6RS definition of "character", I think the editors would like
to include most things that people wish to write within an identifier
or string constant. Among the well understood and widely implemented
definitions of character, the only candidates seem to be UTF-16 code
points and Unicode scalar values. As far as we can tell, best practice
currently points to scalar values.

Keeping in mind that the goal is portability, the question with respect
`char-upcase', `string-ci=?', etc. is not whether they do the "right"
thing with respect to Unicode or natural language, but whether they are
needed to write portable programs, whether they are so common that we
should give them names to avoid gratuitous incompatibility, whether
they are sufficiently simple to implement that we should impose them as
a requirement on all Scheme systems, and whether the set of standardize
operations is reasonably consistent.

I am personally convinced (by this discussion and by past experience)
that `string-ci=?' as defined in the SRFI is not what you really want
under most circumstances. But it's often a good approximation. I think
that Scheme needs at least an operation like `string-ci=?' for portable
programs, something like it will exist in most implementations, it's
simple to implement, and it's consistent with the rest of the proposal
---- so it still seems right to me to put it in the SRFI, despite its
many flaws.

A similar line of reasoning applies to the other operations. In
contrast, a `string-ci=?' based on the the Unicode collation algorithm,
while certainly a better approximation, seems like too much of an
implementation burden to be in the SRFI. (Many posts on this list
address exactly the issues of usefulness and complexity for various
operations, and I find those posts particularly helpful.)

The above does not begin to cover many other points raised in the
discussion, and even for what it says, there are plenty of arguments to
the contrary already on the list. Hopefully, though, it helps clarify
the goal of the current SRFI as discussion continues.

Thanks, again, to everyone,
Matthew