Re: Encodings. Paul Schlie 13 Feb 2004 01:40 UTC

Hi Robby,

Sorry, but unfortunately I'm missing your point; as it seems to me that
the only potential ambiguity that exists by continuing to restrict the
composition of scheme code to using a "truly portable" character subset
as exists today, is a formal specification of how to express/display
characters and strings composed of characters beyond scheme's portable
character set. (as scheme programs utilizing portable characters already
can be converted/displayed/editable on most known present/future hosts).

Which for all practical purposes doesn't seem like much of problem, as
scheme already enables the numerical expression/display of arbitrary
character/byte encoded values, (which of course would be specific to
the character encoding scheme an implementation chose to utilize, which
fortunately not specified by the standard; thereby enabling scheme
implementations to adopt the assumed character encoding utilized by it's
host environment, thereby enabling the assumption of character and raw
data byte storage and I/O sequence equivalence; thereby enabling
character/byte strings and ports to arbitrarily store and interface with
it's environment utilizing any data encoding format that may be required
for any arbitrary purpose. (actually a fairly flexible scheme).

For the sake of argument, in circumstances where it may be desirable to
support the expression of non-portably-displayable encoded extended
character-set characters, why not simply define their names spelled
within scheme's portable character set; just as #\space is spelled out.

i.e.: #\uc:ezet, #\uc:some-chinese-character-name, #\uc:pi, etc.

  or: (uc 'ezet), (uc 'some-chinese-character-name), (uc 'pi), etc.

which I believe have already been named/spelled in unicode's documentation.

As otherwise complications arise when one tries to:

- specify the character encoding format utilized by scheme, as it then
may force characters to translated between scheme's character encoding,
and that presumed by the host's environment, thereby preventing the use
of scheme's character strings and ports for arbitrarily encoded data,
for which scheme presently specifies no alternative facility.

- specify the use of a specific extended character-set for both scheme
program code and arbitrary character data, which may not easily be
unambiguously translated between arbitrary other character-sets, and/or
possibly not even displayable or easily editable on arbitrary platforms.

Incidentally, while admitting to likely being somewhat both culturally
and/or historically biased, I know I have no interest (even if Unicode
were ubiquitous), trying to decipher a program composed of mixed Chinese,
Japanese, English, French, Greek, Slavic, etc. identifiers and comments; as
if programs like this were allowed to be produced, they would be basically
unsupportable, and the industry would collapse upon itself, as it already
has enough problems trying to maintain code written in a single relatively
restricted language and character set, which for good or bad, folks within
the computing world have had to become reasonably familiar with, thereby
unifying programmers ability to develop, debug, and share common code;
otherwise we'll end up with a heterogeneous language code base such that
rather than (+ 1 1 1) -> 3, well end up with (+ 1 1 1) -> 1 to no one's
true benefit.

-paul-

> From: Robby Findler <xxxxxx@cs.uchicago.edu>
>> At Thu, 12 Feb 2004 16:23:17 -0500, Paul Schlie wrote:
>> As Ken properly pointed out, and which should be abundantly clear to most
>> by now; attempting to enable scheme to more conveniently process text
>> encoded in an arbitrary character set, is distinctly different than
>> attempting to enable scheme to utilize arbitrary characters within its
>> program identifier/comment definitions.
>>
>> While the first is arguably noble, the second would be clearly a mistake.
>
> I think you're missing one of the real virtues of Scheme (LISP,
> originally). As someone has already pointed out, Scheme's data is a
> very good representation for Scheme's code.
>
> Indeed, Schemers can exploit this to tremendous advantage. For example,
> imagine you wanted to write a test suite for a macro you had written
> and in particular wanted to test that syntax error are raised properly
> for bad inputs. In Scheme, this is merely a additional 2 lines in your
> testing infrastructure (one to call `expand' and one to catch the
> exception). You do not need to step out of the language or start
> scripting another instance of your compiler.
>
> Going even further, consider DrScheme. DrScheme only has one virtual
> machine that runs DrScheme itself's code and simultaneously runs the
> user's program. Scheme's code as data is one piece of the puzzle that
> makes this work so well.
>
> Robby