terminology
Tom Lord
(10 Feb 2004 20:50 UTC)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
Re: Encodings. Tom Lord (20 Feb 2004 19:04 UTC)
|
||
(missing)
|
||
Re: terminology
Tom Lord
(20 Feb 2004 21:32 UTC)
|
||
Re: terminology
bear
(21 Feb 2004 01:49 UTC)
|
||
Re: terminology
Tom Lord
(21 Feb 2004 02:38 UTC)
|
> From: Ken Dickey <xxxxxx@allvantage.com> > On Thursday 12 February 2004 06:45 pm, bear wrote: > > > Defining valid identifier syntax such that case folding of > > > (unnormalized) identifier literals should be sufficient. > > > What am I missing? > > You're missing all the tools and utilities out there that are > > programmed with the expectation and requirement that they can > > arbitrarily impose or change normalization forms without changing the > > text of the documents they handle. ��There is no escaping this; even > > Emacs and Notepad do it. > Ah! So a broken language (huge tables and complex processing) must be defined > to deal with broken tools which do not write out Unicode data in a canonical > format. > What about creating a tool which reads bizarre Unicode and writes it out in a > canonical format? Then requiring portable Scheme programs to pass through > it? > Sounds like a service to the entire Unicode community. It could be written in > portable Scheme and serve as a (presumably good) advertisement for the > language. > Don't complexify the implementation, simplify the problem! There's a distinction and separation-of-concerns to make here. And there's some compiler-perspective bigotry to undo. Finally, let me try to give a new perspective on my cumulative project here. First: let's not forget that SRFI-52 most explicitly does _not_ require _any_ degree of Unicode support from implementations. The _only_ thing it does is to tweak the language spec in some minor ways that are needed so that the R6RS doesn't _preclude_ a conforming implementation from supporting Unicode. Much of the discussion that has taken place during my absense is not really focused on SRFI-52 issues -- but on issues raised in the "preview proto-SRFIs" that I've published at the same time. (It's _fine_ (good even) to host that discussion here. Very appropriate. But let's not conflate the proposals of those other proto-SRFIs with the very conservative content of (real-)SRFI-52.) Second: it's just not realistic to punt the complexities of Unicode by saying that Scheme code needs to pass through a canonicalizing filter. There's the question of READ and it's correlates -- consideration of source code only is not sufficient. S-expressions have to grow up to be a real exchange format or else Scheme (and lisp generally) sucks. Third -- the project here: R6RS is not going to be "Unicode Scheme", in my opinion. Nor should any R^NRS for any value of N. There ought to be a "Unicode Scheme Standard" -- to facilitate both data and code exchange -- but it should be layered. Human language is not essential to computing: not a topic for R^NRS, ever. (A small subset of ASCII, on the other hand, is "of the essence" :-) -t p.s.: it is naive to believe that the Unicode community is suffering for the lack of canonicalization filters. At the same time, it is a healthy example of "philology recapitulates..." that we've arrived at wondering if and how we want one in this context.