Re: Encodings. Paul Schlie 12 Feb 2004 21:09 UTC

> bear <xxxxxx@sonic.net>:
>
>> On Thu, 12 Feb 2004, Ken Dickey wrote:
>>
>> I assume that it is useful to distinguish the two goals of extending
>> programming language identifiers and processing Unicode data.
>
> For temporary solutions and bandaids, yes.  But scheme is a lisp, and
> our code is data and our data is code.  Our identifier-naming rules,
> ultimately, *can* affect our program behavior, where with C and similar
> languages, it cannot.
>
> Every implementation that deals with Unicode at all seriously is going
> to have to create rules for distinguishing Unicode identifiers, and to
> the extent that they adopt *different* rules, there will be enduring
> and sometimes very subtle portability problems, and bugs where code
> works slightly differently on one system than it does on another

As Ken properly pointed out, and which should be abundantly clear to most
by now; attempting to enable scheme to more conveniently process text
encoded in an arbitrary character set, is distinctly different than
attempting to enable scheme to utilize arbitrary characters within its
program identifier/comment definitions.

While the first is arguably noble, the second would be clearly a mistake.

Since scheme's presently specified required character-set (not encoding)
is already by-design a subset of the most broadly utilized character-sets;
programs (including identifier and comment definitions) are easily and
unambiguously transcodeable between any of these more broadly utilized
character-sets; thereby enabling scheme program code to be "portable".

Attempting to enable scheme programs to utilize characters within it's
identifier and comment definitions, which are themselves not a pure
subset of most broadly utilized character-set definitions, will enable
the specification of scheme programs with are not easily and unambiguously
transcodeable between arbitrary broadly utilized character sets, therefore
"not portable"; which doesn't seem too clever or noble.

If this distinction is understood, and taken to heart; most of the
discussions revolving around ambiguities associated with the potential use
of arbitrary Unicode characters within scheme program text disappear; in
turn enabling discussions to focus on the potential extension of scheme to
support more conveniently the expression of algorithms which process text
which may be composed of arbitrary character-set characters, beyond those
which portable scheme programs may be composed of themselves.

(actually, it seems that the specification of anything beyond the trivial
enabled use of extended character-sets is likely premature, given what
appears to be limited practical experience with potential solutions within
the community. Maybe a few straw-man solutions which have at least been
somewhat "rung out" through trial application code development needs to
occur first?)

-paul-