Re: Identifiers Bradd W. Szonye 13 Feb 2004 06:39 UTC

bear wrote:
> There are some appropriate restrictions [on codepoints in
> identifiers], I think; identifiers should not begin with:
>
>  * a combining character
>  * a non-character codepoint
>  * a whitespace character
>  * a control character
>  * characters which can begin syntactically valid numbers
>       (digits, sign, point)
>  * a delimiter (parens, at least)

Agreed. (The 5th point, symbol/number ambiguity isn't too hard to deal
with, and it's a popular extension to allow ids like "1+").

> Identifiers should not contain:
>   * whitespace
>   * delimiters
>   * non-character codepoints
>   * control characters
>   * invalid sequences

Agreed.

> The minimum requirement for case insensitivity as defined by
> R5RS gives another rule:
>
>   * no character in an identifier ought to be automatically
>     converted to the implementation's preferred case (and no
>     identifier differing only by that character versus another
>     ought to be considered the same identifier)  unless it is
>     part of a one-to-one reciprocal pair of upper and lower case
>     characters as identified by char-upcase, char-downcase, and
>     char-ci=?.   This finally is the property that is required
>     for the char-alphabetic? characters in the portable character
>     set: R5RS does not say so specifically but it is not possible
>     to comply with R5RS without meeting this requirement.

Hm. Makes sense.

> Note that R5RS permits 'rules raping' in terms of this requirement;
> An implementation of R5RS is fairly easy if no characters other than
> a ... z and A ... Z are case-folded in case insensitive identifiers
> and char-alphabetic? returns #t for only those characters.

Heh. I don't think that would be desirable.
--
Bradd W. Szonye
http://www.szonye.com/bradd