Re: Encodings. | Simplelists

Re: Encodings. Paul Schlie 13 Feb 2004 05:09 UTC
Thanks, I think I understand your point; and agree that scheme's present
specification does presently equate the character-set which may compose
programs, to those characters which may symbolically compose strings, and
provides a facility to specify, query, and display the numerical 8-bit value
value equivalent of any arbitrary characters, without bias or assumption of
any particular encoding, or it's membership in scheme's character-set;
although only a subset of characters within the specified character-set may
compose identifier names.

However, I arrive at a very different conclusion with respect to maintaining
the spirit of scheme's present specified character-set with respect to
enabling scheme to process text in extended character sets; as I interpret
that scheme has already implied it's intent, and solution, by specifying
only ~96/256 encoding neutral element portable character-set; where although
a character may have any of 256 (8-bit) values, only 96 of them may be
utilized to symbolically compose program and string text, where when it is
desired to specify a character value which does not correspond to a member
of it's specified character-set, it may be specified and displayed
numerically (where it's representation is constrained to standard scheme
character-set members). Which has enabled the development and distribution
of scheme programs unambiguously portably encode-able in a wide variety of
different character-set specifications as may be required by various
platforms.

Therefore by analogy, I see no reason to fundamentally change anything with
respect to scheme's portable character-set specification, as implementations
are already free to encode the scheme character set as it sees fit within
scheme's (presently implied 8-bit character values), and explicitly specify,
query, and display any character value, or sequence values as an encoded
numerical equivalent utilizing any encoding format desired, while still
restricting their expression within scheme code to be composed of portable
scheme characters. (as otherwise you haven't got portable code).

If there is a fundamental desire to extend the abstraction of scheme's
characters, string, and port types beyond their presently strongly implied
binary byte oriented basis; then I see no alternative but to co-specify an
alternative base-line binary port interface and data types, as scheme
requires encoding agnostic data and I/O facilities from which more abstract
and encoding specific data-types may be supported.

WRT: learning English to read/write scheme code; although an interesting
implied topic, I suspect the solution's challenge lies less with the
adoption of a "universal character set", and more with the specification of
the linguistic equivalence and automatic translation of arbitrarily
specified symbolic names and prose as may exist in comments, from/to
arbitrary languages, and likely correspondingly arbitrary native
character-sets as local platforms may still require.

Thanks again for your time and thoughts, -paul-

> From: Robby Findler <xxxxxx@cs.uchicago.edu>
>
> All I'm saying is that whatever restrictions you make on the
> "composition of Scheme code" you are also making on the values in the
> language, since Scheme program text and literals (aka string and symbol
> values (and some others)) are all the same things. By definition.
>
> It's true, you could come up with some language that allowed values
> representing unicode strings but didn't allow program text in unicode.
> It just wouldn't be faithful to Scheme. In other words, if the
> technical problems you suggest make it impossible to achieve this, to
> me that means we've failed.
>
> I have no comment on whether or not everyone should learn English to be
> able to read Scheme code -- that seems outside the scope of this forum :).
>
> Robby
>