> From: Ken Dickey <xxxxxx@allvantage.com>
> On Thursday 12 February 2004 06:45 pm, bear wrote:
> > > Defining valid identifier syntax such that case folding of
> > > (unnormalized) identifier literals should be sufficient.
> > > What am I missing?
> > You're missing all the tools and utilities out there that are
> > programmed with the expectation and requirement that they can
> > arbitrarily impose or change normalization forms without changing the
> > text of the documents they handle. ��There is no escaping this; even
> > Emacs and Notepad do it.
> Ah! So a broken language (huge tables and complex processing) must be defined
> to deal with broken tools which do not write out Unicode data in a canonical
> format.
> What about creating a tool which reads bizarre Unicode and writes it out in a
> canonical format? Then requiring portable Scheme programs to pass through
> it?
> Sounds like a service to the entire Unicode community. It could be written in
> portable Scheme and serve as a (presumably good) advertisement for the
> language.
> Don't complexify the implementation, simplify the problem!
There's a distinction and separation-of-concerns to make here. And
there's some compiler-perspective bigotry to undo. Finally, let me
try to give a new perspective on my cumulative project here.
First: let's not forget that SRFI-52 most explicitly does _not_
require _any_ degree of Unicode support from implementations. The
_only_ thing it does is to tweak the language spec in some minor ways
that are needed so that the R6RS doesn't _preclude_ a conforming
implementation from supporting Unicode. Much of the discussion that
has taken place during my absense is not really focused on SRFI-52
issues -- but on issues raised in the "preview proto-SRFIs" that I've
published at the same time.
(It's _fine_ (good even) to host that discussion here. Very
appropriate. But let's not conflate the proposals of those other
proto-SRFIs with the very conservative content of (real-)SRFI-52.)
Second: it's just not realistic to punt the complexities of Unicode by
saying that Scheme code needs to pass through a canonicalizing filter.
There's the question of READ and it's correlates -- consideration of
source code only is not sufficient. S-expressions have to grow up to
be a real exchange format or else Scheme (and lisp generally) sucks.
Third -- the project here: R6RS is not going to be "Unicode Scheme",
in my opinion. Nor should any R^NRS for any value of N. There ought
to be a "Unicode Scheme Standard" -- to facilitate both data and code
exchange -- but it should be layered. Human language is not
essential to computing: not a topic for R^NRS, ever.
(A small subset of ASCII, on the other hand, is "of the essence" :-)
-t
p.s.: it is naive to believe that the Unicode community is suffering
for the lack of canonicalization filters. At the same time, it is a
healthy example of "philology recapitulates..." that we've arrived at
wondering if and how we want one in this context.