text processes vs. string procedures

Show/hide message thread

Case-mapping, Unicode & internationalisation shivers@xxxxxx (24 Jan 2000 13:37 UTC)

Re: Case-mapping, Unicode & internationalisation Sergei Egorov (24 Jan 2000 17:09 UTC)

text processes vs. string procedures shivers@xxxxxx (24 Jan 2000 21:52 UTC)

Re: text processes vs. string procedures Sergei Egorov (24 Jan 2000 22:39 UTC)

Re: text processes vs. string procedures shivers@xxxxxx (25 Jan 2000 01:19 UTC)

text processes vs. string procedures shivers@xxxxxx 24 Jan 2000 21:52 UTC

I agree with almost all of Sergei's msg.

- Basic string procs should *not* require textual well-formedness in a Unicode
  world. A string full of accents and umlauts and cedillas with no preceding
  base or start character is still a legal string.

- Full Unicode support will certainly require other procedures not in the
  SRFI-13 spec. Sergei's examples of canonical & compatibility decomposition
  and composition are good ones. These should go in a Unicode-specific
  library, which is not the goal of SRFI-13.

- We also certainly need to do a new char library. Or perhaps a pair of them:
  one generic one, and one for Unicode-specific things.

- However, I think case-mapping and string-comparison are basic things, and
  they can be given a generic, portable definition independent of the
  underlying character encoding. Case-mapping does *not* require strings to be
  well-formed text. ASCII, Latin-1 and Unicode all provide a clear,
  language-independent definitions of this operation.

  I don't want the string library to be minimal. I want it to be useful.
  People -- many of whom currently program with Latin-1 or ASCII Schemes --
  case-map and compare strings frequently. These operations can be provided
  with an API which is portable across ASCII, Latin-1 and Unicode. So there's
  no barrier here.