text processes vs. string procedures
shivers@xxxxxx 24 Jan 2000 21:52 UTC
I agree with almost all of Sergei's msg.
- Basic string procs should *not* require textual well-formedness in a Unicode
world. A string full of accents and umlauts and cedillas with no preceding
base or start character is still a legal string.
- Full Unicode support will certainly require other procedures not in the
SRFI-13 spec. Sergei's examples of canonical & compatibility decomposition
and composition are good ones. These should go in a Unicode-specific
library, which is not the goal of SRFI-13.
- We also certainly need to do a new char library. Or perhaps a pair of them:
one generic one, and one for Unicode-specific things.
- However, I think case-mapping and string-comparison are basic things, and
they can be given a generic, portable definition independent of the
underlying character encoding. Case-mapping does *not* require strings to be
well-formed text. ASCII, Latin-1 and Unicode all provide a clear,
language-independent definitions of this operation.
I don't want the string library to be minimal. I want it to be useful.
People -- many of whom currently program with Latin-1 or ASCII Schemes --
case-map and compare strings frequently. These operations can be provided
with an API which is portable across ASCII, Latin-1 and Unicode. So there's
no barrier here.