bear wrote:
> There are some appropriate restrictions [on codepoints in
> identifiers], I think; identifiers should not begin with:
>
> * a combining character
> * a non-character codepoint
> * a whitespace character
> * a control character
> * characters which can begin syntactically valid numbers
> (digits, sign, point)
> * a delimiter (parens, at least)
Agreed. (The 5th point, symbol/number ambiguity isn't too hard to deal
with, and it's a popular extension to allow ids like "1+").
> Identifiers should not contain:
> * whitespace
> * delimiters
> * non-character codepoints
> * control characters
> * invalid sequences
Agreed.
> The minimum requirement for case insensitivity as defined by
> R5RS gives another rule:
>
> * no character in an identifier ought to be automatically
> converted to the implementation's preferred case (and no
> identifier differing only by that character versus another
> ought to be considered the same identifier) unless it is
> part of a one-to-one reciprocal pair of upper and lower case
> characters as identified by char-upcase, char-downcase, and
> char-ci=?. This finally is the property that is required
> for the char-alphabetic? characters in the portable character
> set: R5RS does not say so specifically but it is not possible
> to comply with R5RS without meeting this requirement.
Hm. Makes sense.
> Note that R5RS permits 'rules raping' in terms of this requirement;
> An implementation of R5RS is fairly easy if no characters other than
> a ... z and A ... Z are case-folded in case insensitive identifiers
> and char-alphabetic? returns #t for only those characters.
Heh. I don't think that would be desirable.
--
Bradd W. Szonye
http://www.szonye.com/bradd