introduction Tom Lord 10 Feb 2004 21:06 UTC


Welcome to the SRFI-52 mailing list.

Let me begin by pointing out that while the current official draft
of SRFI-52 is at the usual place (srfi.schemers.org), between
submissions of drafts I will be keeping more recent text available
at:

  http://regexps.srparish.net/srfi-drafts/permitting-unicode.srfi

The changes so far are quite minor.   As additional changes are made
I'll announce those on this list.

There is a short story behind this SRFI.  I'm currently helping to
design and build a new implementation called Pika Scheme.  Among the
goals for this implementation is that is have good support for
Unicode.  For example, I want to be able to represent Unicode
characters[*] as CHAR?  values, Unicode strings as STRING? values, and
allow programmers to write identifiers in their programs using Unicode
characters.

While working on that I noticed (as have some others) that some of the
requirements of R5RS can not be satisfied in a natural way by a
Unicode implementation.   The largest source of problems concerns case
mapping:  R5RS has a relatively naive view of the nature of upper and
lowercase characters -- a view that does not model Unicode well.
These problematic R5RS requirements effect some standard procedures
and the definition of "identifier equivalence".   There effect is
particularly noticable in portable Scheme programs that attempt to
process Scheme s-expression syntax.

So, I became interested in looking for ways to "fix" R5RS -- to change
the requirements so that Unicode implementations could more easily
comply.   In the spirit of the Revised Report, I want my proposed
changes to be small, to not be specific to Unicode, to not complicate
the report needlessly, to be as upwards compatible as possible, and
ideally to be easy to implement in an implementation which already
conforms to R5RS.

Concurrently with working on that, I've also been working on
specifying the particular CHAR? and STRING? type that will be used in
Pika Scheme.   To provide some context, I've made available some of
those other specifications at:

      http://regexps.srparish.net/srfi-drafts/INDEX.html

I've written the other specifications in the _form_ of draft SRFIs,
however it is currently far too early to submit any of them to the
SRFI process.

-t

[*] What exactly is a "Unicode character?"  The answer can vary
    depending on context.  In some contexts it might mean a Unicode
    abstract character -- the kind of value to which a codepoint
    (integer in the range 0..10ffff) is assigned.  In other contexts,
    it may mean certain kinds of sequences of abstract characters.

    One goal for SRFI-52 is to remain agnostic about the answer
    to that question.