strings draft Tom Lord (22 Jan 2004 04:58 UTC)
Re: strings draft Shiro Kawai (22 Jan 2004 09:46 UTC)
Re: strings draft Tom Lord (22 Jan 2004 17:32 UTC)
Re: strings draft Shiro Kawai (23 Jan 2004 05:03 UTC)
Re: strings draft Tom Lord (24 Jan 2004 00:31 UTC)
Re: strings draft Matthew Dempsky (24 Jan 2004 03:00 UTC)
Re: strings draft Shiro Kawai (24 Jan 2004 03:27 UTC)
Re: strings draft Tom Lord (24 Jan 2004 04:18 UTC)
Re: strings draft Shiro Kawai (24 Jan 2004 04:49 UTC)
Re: strings draft Tom Lord (24 Jan 2004 18:47 UTC)
Re: strings draft Shiro Kawai (24 Jan 2004 22:16 UTC)
Octet vs Char (Re: strings draft) Shiro Kawai (26 Jan 2004 09:58 UTC)
Strings, one last detail. bear (30 Jan 2004 21:12 UTC)
Re: Strings, one last detail. Shiro Kawai (30 Jan 2004 21:43 UTC)
Re: Strings, one last detail. Tom Lord (31 Jan 2004 00:13 UTC)
Re: Strings, one last detail. bear (31 Jan 2004 20:26 UTC)
Re: Strings, one last detail. Tom Lord (31 Jan 2004 20:42 UTC)
Re: Strings, one last detail. bear (01 Feb 2004 02:29 UTC)
Re: Strings, one last detail. Tom Lord (01 Feb 2004 02:44 UTC)
Re: Strings, one last detail. bear (01 Feb 2004 07:53 UTC)
Re: Octet vs Char (Re: strings draft) bear (26 Jan 2004 19:04 UTC)
Re: Octet vs Char (Re: strings draft) Matthew Dempsky (26 Jan 2004 20:12 UTC)
Re: Octet vs Char (Re: strings draft) Matthew Dempsky (26 Jan 2004 20:40 UTC)
Re: Octet vs Char Shiro Kawai (26 Jan 2004 23:39 UTC)
Re: Octet vs Char (Re: strings draft) Ken Dickey (27 Jan 2004 04:33 UTC)
Re: Octet vs Char Shiro Kawai (27 Jan 2004 05:12 UTC)
Re: Octet vs Char Tom Lord (27 Jan 2004 05:23 UTC)
Re: Octet vs Char bear (27 Jan 2004 08:35 UTC)
Re: Octet vs Char (Re: strings draft) bear (27 Jan 2004 08:33 UTC)
Re: Octet vs Char (Re: strings draft) Ken Dickey (27 Jan 2004 15:43 UTC)
Re: Octet vs Char (Re: strings draft) bear (27 Jan 2004 19:06 UTC)
Re: strings draft bear (22 Jan 2004 19:05 UTC)
Re: strings draft Tom Lord (23 Jan 2004 01:53 UTC)
READ-OCTET (Re: strings draft) Shiro Kawai (23 Jan 2004 06:01 UTC)
Re: strings draft bear (23 Jan 2004 07:04 UTC)
Re: strings draft bear (23 Jan 2004 07:20 UTC)
Re: strings draft Tom Lord (24 Jan 2004 00:02 UTC)
Re: strings draft Alex Shinn (26 Jan 2004 01:59 UTC)
Re: strings draft Tom Lord (26 Jan 2004 02:22 UTC)
Re: strings draft bear (26 Jan 2004 02:35 UTC)
Re: strings draft Tom Lord (26 Jan 2004 02:48 UTC)
Re: strings draft Alex Shinn (26 Jan 2004 03:00 UTC)
Re: strings draft Tom Lord (26 Jan 2004 03:14 UTC)
Re: strings draft Shiro Kawai (26 Jan 2004 04:57 UTC)
Re: strings draft Alex Shinn (26 Jan 2004 04:58 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 18:48 UTC)
Re: strings draft bear (24 Jan 2004 02:21 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 02:10 UTC)
Re: strings draft Tom Lord (23 Jan 2004 02:29 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 02:44 UTC)
Re: strings draft Tom Lord (23 Jan 2004 02:53 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 03:04 UTC)
Re: strings draft Tom Lord (23 Jan 2004 03:16 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 03:42 UTC)
Re: strings draft Alex Shinn (23 Jan 2004 02:35 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 02:42 UTC)
Re: strings draft Tom Lord (23 Jan 2004 02:49 UTC)
Re: strings draft Alex Shinn (23 Jan 2004 02:58 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 03:13 UTC)
Re: strings draft Alex Shinn (23 Jan 2004 03:19 UTC)
Re: strings draft Bradd W. Szonye (23 Jan 2004 19:31 UTC)
Re: strings draft Alex Shinn (26 Jan 2004 02:22 UTC)
Re: strings draft Bradd W. Szonye (06 Feb 2004 23:30 UTC)
Re: strings draft Bradd W. Szonye (06 Feb 2004 23:33 UTC)
Re: strings draft Alex Shinn (09 Feb 2004 01:45 UTC)
specifying source encoding (Re: strings draft) Shiro Kawai (09 Feb 2004 02:51 UTC)
Re: strings draft Bradd W. Szonye (09 Feb 2004 03:39 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 03:12 UTC)
Re: strings draft Alex Shinn (23 Jan 2004 03:28 UTC)
Re: strings draft tb@xxxxxx (23 Jan 2004 03:44 UTC)
Parsing Scheme [was Re: strings draft] Ken Dickey (23 Jan 2004 17:02 UTC)
Re: Parsing Scheme [was Re: strings draft] bear (23 Jan 2004 17:56 UTC)
Re: Parsing Scheme [was Re: strings draft] tb@xxxxxx (23 Jan 2004 18:50 UTC)
Re: Parsing Scheme [was Re: strings draft] Per Bothner (23 Jan 2004 18:56 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (23 Jan 2004 20:26 UTC)
Re: Parsing Scheme [was Re: strings draft] Per Bothner (23 Jan 2004 20:57 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (23 Jan 2004 21:44 UTC)
Re: Parsing Scheme [was Re: strings draft] Ken Dickey (23 Jan 2004 21:47 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (23 Jan 2004 23:22 UTC)
Re: Parsing Scheme [was Re: strings draft] Ken Dickey (25 Jan 2004 01:03 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (25 Jan 2004 03:01 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (23 Jan 2004 20:07 UTC)
Re: Parsing Scheme [was Re: strings draft] tb@xxxxxx (23 Jan 2004 21:22 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (23 Jan 2004 22:38 UTC)
Re: Parsing Scheme [was Re: strings draft] tb@xxxxxx (24 Jan 2004 06:48 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (24 Jan 2004 18:41 UTC)
Re: Parsing Scheme [was Re: strings draft] tb@xxxxxx (24 Jan 2004 19:34 UTC)
Re: Parsing Scheme [was Re: strings draft] Tom Lord (24 Jan 2004 21:48 UTC)
Re: strings draft Matthew Dempsky (25 Jan 2004 06:59 UTC)
Re: strings draft Tom Lord (25 Jan 2004 07:16 UTC)
Re: strings draft Matthew Dempsky (26 Jan 2004 23:52 UTC)
Re: strings draft Tom Lord (27 Jan 2004 00:30 UTC)

Re: Parsing Scheme [was Re: strings draft] tb@xxxxxx 24 Jan 2004 19:34 UTC

Tom Lord <xxxxxx@emf.net> writes:

> We should also point readers in general to:
>
>   http://www.unicode.org/reports/tr15/#Programming_Language_Identifiers
>
> which is Annex 7 ("Programming Language Identifiers") of Unicode
> Technical Report 15 ("Unicode Normalization Forms").

Yes.  I think the Unicode suggestions for programming language
identifiers are good ones, and we should both point to them and
strongly suggest their use.  I'm not quite prepared to say that we
should standardize Scheme to require it (even on Unicode places)

> * (identifier? s) => <bool>

This is fine.  An implementation should be allowed to always return #t
from this function, even though not every such string could be parsed
as an identifier by the reader.  (This for the sake of eval, at least.)

>      The definition of FOLD-IDENTIFIER must be consistent with the
>      recommendations of Annex 7 ("Programming Language Identifiers" of
>      Unicode Technical Report 15 for identifier names comprised
>      entirely of Unicode characters.

Again, I would suggest that we merely advocate this, but not require it.

>      For this purpose, the characters
>      of the portable Scheme character set are considered to be Unicode
>      characters.  (A short summary of the implications of this
>      requirement for portable identifiers is that given a portable
>      identifier, FOLD-IDENTIFIER must map #\A..#\Z to #\a..#\z.)

On the other hand, we should certainly specify exactly the behavior of
the function for the required character set, agreed.

>      (FOLD-IDENTIFIER is preferable to STRING-ID=? because it
>      produces a canonical form of each identifier explicitly
>      rather than implicitly.   The canonical form is useful because
>      it can be hashed, stored in a trie, etc.   It would be
>      impractical to implement, for example, a symbol table in a
>      compiler given only STRING-ID=?.)

I think my worry is that it is not obvious that an implementation even
has an implicit folding available, at least, not cheaply.  There
should perhaps be a hash function to go with string-id=? to help.

Many implementations will of course implement these things by
folding.  But if you think that really string-id=? should be allowed
to implement arbitrary equivalence classes (provided that the standard
character set works right), it isn't obvious to me that
fold-identifier can be cheap, and that it might well be more expensive
than whatever straightforward test is used.

> * (concatenate-identifiers s0 s1 ...) => id
>
>      Return a string ID, containing an identifier name which
>      is the concatenation of the arguments which must themselves
>      be identifier names.

>      (As nearly as I can tell, CONCATENATE-IDENTIFIERS is needed
>      because IDENTIFIER? won't be closed under STRING-APPEND -- but
>      I could be mistake about that.  More research is needed.)

In the cases where identifier? isn't closed under string-append,
concatenate-identifiers might need to do more work than just
concatenate.  (What does "the concatenation of the arguments" mean, if
not string-append?)

> * (char-id-start? c) => <bool>
>   Return #t if C is a valid first character in an identifier.
>
> * (char-id-extend? c) => <bool>
>   Return #t if C is a valid non-first character in an identifier.

These may be contextual.  A character may be allowed in the beginning
of an identifier but only if, something else is true later on.
(Consider the "if it's not a number, it's an identifier" rule of the
current standard.)

Perhaps a system might want to have functions like this, but I'd like
to see more experience before standardizing something.

> What about case independent character ordering (e.g., CHAR-CI<? and
> STRING-CI<?)?  I see no compelling reason to eliminate them at this
> stage -- they're still useful.  I think they should be specified to be
> consistent with the single-character default case foldings of Unicode,
> where the portable character set is considered to consist of Unicode
> characters.  This will allow portable Scheme programs to use these
> procedures to write programs which accurately manipulate Scheme
> programs that use nothing but the portable character set.

string-ci<? is fine, but must have a locale argument.  If you want to
have a standardly specified "default case foldings of Unicode" locale,
that's fine with me.  Ditto for char-ci<?.

> What about case mappings (CHAR-UPCASE and CHAR-DOWNCASE).  Again:
> retain them;  specify them as using the Unicode single character
> mappings; permit implementations to add parameters are new procedures
> -- the result allows portable Scheme programs to handle portable
> Scheme program texts and captures a useful Unicode text process.

No, no, no.  Don't make functions that are known to be wrong.  This is
a bad idea.  It's like requiring < to work for complex numbers, and
then comparing magnitude, and saying "well, that's close enough".
It's not.

You can case map strings, and this should certainly be allowed.  It
should also have a locale argument.

You cannot sensible case-map characters except in the "unicode single
character mappings" locale; and why should we have special privileged
functions there?  It will only encourage people to *use* the
functions, and their code will then be non-portable precisely when it
matters.

At the very least, make it allowed for char-upcase to simply fail to
give any answer, and provide a locale argument.  Or allow char-upcase
to return a string.

> A final note: the desirability of the -CI, -UPCASE, and -DOWNCASE
> procedures hinges on the assumption that the portable Scheme character
> set is a proper subset of Unicode.

I'm assuming that (or at least, I want to make it possible), but I do
*not* think that char-upcase and char-downcase are good ideas.

string-upcase and string-downcase, by contrast, are unobjectionable,
provided they get a locale argument.

Thomas