Re: Encodings. Bradd W. Szonye 12 Feb 2004 22:07 UTC

> On Thursday 12 February 2004 06:45 pm, bear wrote:
>> You're missing all the tools and utilities out there that are
>> programmed with the expectation and requirement that they can
>> arbitrarily impose or change normalization forms without changing the
>> text of the documents they handle.  There is no escaping this; even
>> Emacs and Notepad do it.

On Thu, Feb 12, 2004 at 02:10:18PM +0100, Ken Dickey wrote:
> Ah!  So a broken language (huge tables and complex processing) must be
> defined to deal with broken tools which do not write out Unicode data
> in a canonical format.

Storing data in non-canonical form is not "broken." Also, there's more
than one canonical form. The "C" forms compose characters into the
smallest number of code-points possible. The "D" forms decompose them
into fully-general base+combining forms. Programs which disagree on the
form of the I/O will need to translate between the two.

> What about creating a tool which reads bizarre Unicode and writes it
> out in a canonical format?  Then requiring portable Scheme programs to
> pass through it?

That wouldn't help unless they agree to write the *same* canonical
format. Besides, this is just separating part of the reader's job into
an external program, and in an error-prone way.

> Sounds like a service to the entire Unicode community.  It could be
> written in portable Scheme and serve as a (presumably good)
> advertisement for the language.

Not really.
--
Bradd W. Szonye
http://www.szonye.com/bradd