Re: Encodings. - Simplelists

Show/hide message thread

terminology Tom Lord (10 Feb 2004 20:50 UTC)
(missing)
Re: terminology Tom Lord (20 Feb 2004 21:32 UTC)
Re: terminology bear (21 Feb 2004 01:49 UTC)
Re: terminology Tom Lord (21 Feb 2004 02:38 UTC)
(missing)
(missing)
(missing)
Re: Encodings. Tom Lord (20 Feb 2004 19:04 UTC)

Re: Encodings. Tom Lord 20 Feb 2004 19:23 UTC

    > From: Ken Dickey <xxxxxx@allvantage.com>

    > On Thursday 12 February 2004 06:45 pm, bear wrote:
    > > > Defining valid identifier syntax such that case folding of
    > > > (unnormalized) identifier literals should be sufficient.

    > > > What am I missing?

    > > You're missing all the tools and utilities out there that are
    > > programmed with the expectation and requirement that they can
    > > arbitrarily impose or change normalization forms without changing the
    > > text of the documents they handle. ��There is no escaping this; even
    > > Emacs and Notepad do it.

    > Ah!  So a broken language (huge tables and complex processing) must be defined
    > to deal with broken tools which do not write out Unicode data in a canonical
    > format.

    > What about creating a tool which reads bizarre Unicode and writes it out in a
    > canonical format?  Then requiring portable Scheme programs to pass through
    > it?

    > Sounds like a service to the entire Unicode community.  It could be written in
    > portable Scheme and serve as a (presumably good) advertisement for the
    > language.

    > Don't complexify the implementation, simplify the problem!

There's a distinction and separation-of-concerns to make here.  And
there's some compiler-perspective bigotry to undo.  Finally, let me
try to give a new perspective on my cumulative project here.

First: let's not forget that SRFI-52 most explicitly does _not_
require _any_ degree of Unicode support from implementations.  The
_only_ thing it does is to tweak the language spec in some minor ways
that are needed so that the R6RS doesn't _preclude_ a conforming
implementation from supporting Unicode.   Much of the discussion that
has taken place during my absense is not really focused on SRFI-52
issues -- but on issues raised in the "preview proto-SRFIs" that I've
published at the same time.

(It's _fine_ (good even) to host that discussion here.  Very
appropriate.  But let's not conflate the proposals of those other
proto-SRFIs with the very conservative content of (real-)SRFI-52.)

Second: it's just not realistic to punt the complexities of Unicode by
saying that Scheme code needs to pass through a canonicalizing filter.
There's the question of READ and it's correlates -- consideration of
source code only is not sufficient.  S-expressions have to grow up to
be a real exchange format or else Scheme (and lisp generally) sucks.

Third -- the project here:  R6RS is not going to be "Unicode Scheme",
in my opinion.   Nor should any R^NRS for any value of N.  There ought
to be a "Unicode Scheme Standard" -- to facilitate both data and code
exchange -- but it should be layered.   Human language is not
essential to computing: not a topic for R^NRS, ever.

(A small subset of ASCII, on the other hand, is "of the essence" :-)

-t

p.s.: it is naive to believe that the Unicode community is suffering
for the lack of canonicalization filters.   At the same time, it is a
healthy example of "philology recapitulates..." that we've arrived at
wondering if and how we want one in this context.