on waste-of-time arguments....
Thomas Lord 19 Jul 2005 23:52 UTC
John Cowan's misunderstanding is critical (and TB's reply is well
wide of the mark). John wrote:
> SRFI-75 in no way prevents that. It simply says what string<? and
> its friends mean. You can still provide string-uca-simple<? and
> string-uca-locale<? if you want.
No, S-75 does not "simply" do any such thing. Unfortunately.
There are logical entanglements within the standard that mean
you can't tweak the definition of string functions without
simultaneously tweaking things like surface syntax.
The code-point-wise and CaseMapping.txt-wise character and string
functions are perfectly well defined --- perfect additions to the
Scheme diamond. They are low level, true. They aren't the best way
to process natural language Unicode text, sure. But they are also
foundational to perfect Unicode support. That much is just fine
and John is right about that.
The problem that T.B. didn't call out where he should have there
(though I'm pretty sure he knows about it) is that the character
and string primitives are not really orthogonal to either the
surface syntax of the language or to communication between scheme
programs using the standard READ and WRITE procedures.
To be more explicit:
a) The definitions of the character and string functions must be
consistent with the surface syntax of the language. (The
language in the standard is a little weaselly on this point
but that is the simplest interpretation and the one most consistent
with Scheme's heritage of meta-circular programming techniques.)
Therefore, if the character and string functions are "crude"
with respect to natural language, then an implementation
*can not* (cleanly, simply) allow identifier names which are
globally-natural-language-friendly except in a crude way.
We should have the goal of implementations which are not culturally
biased -- implementations which support all languages equally or
at least certain non-english languages perfectly. If we
force identifiers to be crude in that way today, we can not achieve
the larger goal tomorrow without breaking things.
b) An analogous argument applies to the streams emitted and consumed
by READ and WRITE. (This isn't *really* a separate point from (a)
but people commonly treat it that way.)
It's the surface syntax, not any vague notion of what is "encouraged"
or what is the right name for this or that function that is in peril.
It doesn't matter that programs using the standard string procedures
currently specified aren't right for some natural language applications
-- as John says, that's what new functions are for. The only problem
with s-75 is the (one hopes unintended) implications it has for all
future upward compatible syntaxes (for code and data, if you regard
them as separate).
-t