> {[, ], {, |, }} > & (Pattern_Syntax - (ASCII|Sc|Sm|So)) > & Pattern_Whitespace I like that this choice of delimiter characters provides another argument against the over-broad identifier syntax in the current draft. I'm not sure that definition is right for Scheme though. It's especially not clear to me that *all* of those symbols (Sc, Sm, So) should be allowed as identifier constituents -- some of them might be better as delimiters (if they are permitted in source texts at all). (All of this is another argument that identifier syntax liberalization is premature. Since it isn't needed to get the ball rolling in terms of portable Unicode-happy programs, skip it for now.) -t
Since Unicode provides 97,655 characters to play * with (as of Unicode 4.1), it may be time to add some characters to the current list of five reserved syntax characters ([, ], {, }, |). That would bar the use of these characters in identifiers, and allow them to be used by any Scheme system that has redefinable read syntax for whatever purpose. Unicode defines two non-normative classes for the purpose, Pattern_Syntax and Pattern_Whitespace. The intention is that neither may be used in identifiers. (The Unicode classes relating to identifiers are too restrictive for Scheme, and are intended for languages in which identifiers can't contain symbol characters.) I propose that the following set of characters be disallowed in identifiers: {[, ], {, |, }} & (Pattern_Syntax - (ASCII|Sc|Sm|So)) & Pattern_Whitespace Excluding ASCII characters from Pattern_Syntax permits all of our existing ASCII identifier characters, regardless of their Unicode status. The various S codes represent various mathematical and non-mathematical operator and symbol characters that might plausibly see use in identifiers. Pattern_Whitespace is a small set of control/whitespace characters: TAB, LF, VT, FF, CR, SP, plus the new NEL, LRM, RLM, LS, and PS. Here's the full list of 186 reserved syntax characters that I'm proposing. They should be more than enough for even the most grandiose read-syntax extensions. 005B LEFT SQUARE BRACKET 005D RIGHT SQUARE BRACKET 007B LEFT CURLY BRACKET 007C VERITCATICAL LINE 007D RIGHT CURRLY VBRBRACKET 00A1 INVERTED EXCLAMATION MARK 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 00BF INVERTED QUESTION MARK 2010 HYPHEN 2011 NON-BREAKING HYPHEN 2012 FIGURE DASH 2013 EN DASH 2014 EM DASH 2015 HORIZONTAL BAR 2016 DOUBLE VERTICAL LINE 2017 DOUBLE LOW LINE 2018 LEFT SINGLE QUOTATION MARK 2019 RIGHT SINGLE QUOTATION MARK 201A SINGLE LOW-9 QUOTATION MARK 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C LEFT DOUBLE QUOTATION MARK 201D RIGHT DOUBLE QUOTATION MARK 201E DOUBLE LOW-9 QUOTATION MARK 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK 2020 DAGGER 2021 DOUBLE DAGGER 2022 BULLET 2023 TRIANGULAR BULLET 2024 ONE DOT LEADER 2025 TWO DOT LEADER 2026 HORIZONTAL ELLIPSIS 2027 HYPHENATION POINT 2030 PER MILLE SIGN 2031 PER TEN THOUSAND SIGN 2032 PRIME 2033 DOUBLE PRIME 2034 TRIPLE PRIME 2035 REVERSED PRIME 2036 REVERSED DOUBLE PRIME 2037 REVERSED TRIPLE PRIME 2038 CARET 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 203B REFERENCE MARK 203C DOUBLE EXCLAMATION MARK 203D INTERROBANG 203E OVERLINE 2041 CARET INSERTION POINT 2042 ASTERISM 2043 HYPHEN BULLET 2045 LEFT SQUARE BRACKET WITH QUILL 2046 RIGHT SQUARE BRACKET WITH QUILL 2047 DOUBLE QUESTION MARK 2048 QUESTION EXCLAMATION MARK 2049 EXCLAMATION QUESTION MARK 204A TIRONIAN SIGN ET 204B REVERSED PILCROW SIGN 204C BLACK LEFTWARDS BULLET 204D BLACK RIGHTWARDS BULLET 204E LOW ASTERISK 204F REVERSED SEMICOLON 2050 CLOSE UP 2051 TWO ASTERISKS ALIGNED VERTICALLY 2053 SWUNG DASH 2055 FLOWER PUNCTUATION MARK 2056 THREE DOT PUNCTUATION 2057 QUADRUPLE PRIME 2058 FOUR DOT PUNCTUATION 2059 FIVE DOT PUNCTUATION 205A TWO DOT PUNCTUATION 205B FOUR DOT MARK 205C DOTTED CROSS 205D TRICOLON 205E VERTICAL FOUR DOTS 2329 LEFT-POINTING ANGLE BRACKET 232A RIGHT-POINTING ANGLE BRACKET 23B4 TOP SQUARE BRACKET 23B5 BOTTOM SQUARE BRACKET 23B6 BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET 2768 MEDIUM LEFT PARENTHESIS ORNAMENT 2769 MEDIUM RIGHT PARENTHESIS ORNAMENT 276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT 276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT 276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT 2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT 2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT 27C5 LEFT S-SHAPED BAG DELIMITER 27C6 RIGHT S-SHAPED BAG DELIMITER 27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET 27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET 27E8 MATHEMATICAL LEFT ANGLE BRACKET 27E9 MATHEMATICAL RIGHT ANGLE BRACKET 27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET 2983 LEFT WHITE CURLY BRACKET 2984 RIGHT WHITE CURLY BRACKET 2985 LEFT WHITE PARENTHESIS 2986 RIGHT WHITE PARENTHESIS 2987 Z NOTATION LEFT IMAGE BRACKET 2988 Z NOTATION RIGHT IMAGE BRACKET 2989 Z NOTATION LEFT BINDING BRACKET 298A Z NOTATION RIGHT BINDING BRACKET 298B LEFT SQUARE BRACKET WITH UNDERBAR 298C RIGHT SQUARE BRACKET WITH UNDERBAR 298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER 2991 LEFT ANGLE BRACKET WITH DOT 2992 RIGHT ANGLE BRACKET WITH DOT 2993 LEFT ARC LESS-THAN BRACKET 2994 RIGHT ARC GREATER-THAN BRACKET 2995 DOUBLE LEFT ARC GREATER-THAN BRACKET 2996 DOUBLE RIGHT ARC LESS-THAN BRACKET 2997 LEFT BLACK TORTOISE SHELL BRACKET 2998 RIGHT BLACK TORTOISE SHELL BRACKET 29D8 LEFT WIGGLY FENCE 29D9 RIGHT WIGGLY FENCE 29DA LEFT DOUBLE WIGGLY FENCE 29DB RIGHT DOUBLE WIGGLY FENCE 29FC LEFT-POINTING CURVED ANGLE BRACKET 29FD RIGHT-POINTING CURVED ANGLE BRACKET 2E00 RIGHT ANGLE SUBSTITUTION MARKER 2E01 RIGHT ANGLE DOTTED SUBSTITUTION MARKER 2E02 LEFT SUBSTITUTION BRACKET 2E03 RIGHT SUBSTITUTION BRACKET 2E04 LEFT DOTTED SUBSTITUTION BRACKET 2E05 RIGHT DOTTED SUBSTITUTION BRACKET 2E06 RAISED INTERPOLATION MARKER 2E07 RAISED DOTTED INTERPOLATION MARKER 2E08 DOTTED TRANSPOSITION MARKER 2E09 LEFT TRANSPOSITION BRACKET 2E0A RIGHT TRANSPOSITION BRACKET 2E0B RAISED SQUARE 2E0C LEFT RAISED OMISSION BRACKET 2E0D RIGHT RAISED OMISSION BRACKET 2E0E EDITORIAL CORONIS 2E0F PARAGRAPHOS 2E10 FORKED PARAGRAPHOS 2E11 REVERSED FORKED PARAGRAPHOS 2E12 HYPODIASTOLE 2E13 DOTTED OBELOS 2E14 DOWNWARDS ANCORA 2E15 UPWARDS ANCORA 2E16 DOTTED RIGHT-POINTING ANGLE 2E17 DOUBLE OBLIQUE HYPHEN 2E1C LEFT LOW PARAPHRASE BRACKET 2E1D RIGHT LOW PARAPHRASE BRACKET 3001 IDEOGRAPHIC COMMA 3002 IDEOGRAPHIC FULL STOP 3003 DITTO MARK 3008 LEFT ANGLE BRACKET 3009 RIGHT ANGLE BRACKET 300A LEFT DOUBLE ANGLE BRACKET 300B RIGHT DOUBLE ANGLE BRACKET 300C LEFT CORNER BRACKET 300D RIGHT CORNER BRACKET 300E LEFT WHITE CORNER BRACKET 300F RIGHT WHITE CORNER BRACKET 3010 LEFT BLACK LENTICULAR BRACKET 3011 RIGHT BLACK LENTICULAR BRACKET 3014 LEFT TORTOISE SHELL BRACKET 3015 RIGHT TORTOISE SHELL BRACKET 3016 LEFT WHITE LENTICULAR BRACKET 3017 RIGHT WHITE LENTICULAR BRACKET 3018 LEFT WHITE TORTOISE SHELL BRACKET 3019 RIGHT WHITE TORTOISE SHELL BRACKET 301A LEFT WHITE SQUARE BRACKET 301B RIGHT WHITE SQUARE BRACKET 301C WAVE DASH 301D REVERSED DOUBLE PRIME QUOTATION MARK 301E DOUBLE PRIME QUOTATION MARK 301F LOW DOUBLE PRIME QUOTATION MARK 3030 WAVY DASH FD3E ORNATE LEFT PARENTHESIS FD3F ORNATE RIGHT PARENTHESIS FE45 SESAME DOT FE46 WHITE SESAME DOT -- May the hair on your toes never fall out! John Cowan --Thorin Oakenshield (to Bilbo) xxxxxx@reutershealth.com
Since Unicode provides 97,655 characters to play * with (as of Unicode 4.1), it may be time to add some characters to the current list of five reserved syntax characters ([, ], {, }, |). That would bar the use of these characters in identifiers, and allow them to be used by any Scheme system that has redefinable read syntax for whatever purpose. Unicode defines two non-normative classes for the purpose, Pattern_Syntax and Pattern_Whitespace. The intention is that neither may be used in identifiers. (The Unicode classes relating to identifiers are too restrictive for Scheme, and are intended for languages in which identifiers can't contain symbol characters.) I propose that the following set of characters be disallowed in identifiers: {[, ], {, |, }} & (Pattern_Syntax - (ASCII|Sc|Sm|So)) & Pattern_Whitespace Excluding ASCII characters from Pattern_Syntax permits all of our existing ASCII identifier characters, regardless of their Unicode status. The various S codes represent various mathematical and non-mathematical operator and symbol characters that might plausibly see use in identifiers. Pattern_Whitespace is a small set of control/whitespace characters: TAB, LF, VT, FF, CR, SP, plus the new NEL, LRM, RLM, LS, and PS. Here's the full list of 186 reserved syntax characters that I'm proposing. They should be more than enough for even the most grandiose read-syntax extensions. 005B LEFT SQUARE BRACKET 005D RIGHT SQUARE BRACKET 007B LEFT CURLY BRACKET 007C VERITCATICAL LINE 007D RIGHT CURRLY VBRBRACKET 00A1 INVERTED EXCLAMATION MARK 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 00BF INVERTED QUESTION MARK 2010 HYPHEN 2011 NON-BREAKING HYPHEN 2012 FIGURE DASH 2013 EN DASH 2014 EM DASH 2015 HORIZONTAL BAR 2016 DOUBLE VERTICAL LINE 2017 DOUBLE LOW LINE 2018 LEFT SINGLE QUOTATION MARK 2019 RIGHT SINGLE QUOTATION MARK 201A SINGLE LOW-9 QUOTATION MARK 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C LEFT DOUBLE QUOTATION MARK 201D RIGHT DOUBLE QUOTATION MARK 201E DOUBLE LOW-9 QUOTATION MARK 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK 2020 DAGGER 2021 DOUBLE DAGGER 2022 BULLET 2023 TRIANGULAR BULLET 2024 ONE DOT LEADER 2025 TWO DOT LEADER 2026 HORIZONTAL ELLIPSIS 2027 HYPHENATION POINT 2030 PER MILLE SIGN 2031 PER TEN THOUSAND SIGN 2032 PRIME 2033 DOUBLE PRIME 2034 TRIPLE PRIME 2035 REVERSED PRIME 2036 REVERSED DOUBLE PRIME 2037 REVERSED TRIPLE PRIME 2038 CARET 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 203B REFERENCE MARK 203C DOUBLE EXCLAMATION MARK 203D INTERROBANG 203E OVERLINE 2041 CARET INSERTION POINT 2042 ASTERISM 2043 HYPHEN BULLET 2045 LEFT SQUARE BRACKET WITH QUILL 2046 RIGHT SQUARE BRACKET WITH QUILL 2047 DOUBLE QUESTION MARK 2048 QUESTION EXCLAMATION MARK 2049 EXCLAMATION QUESTION MARK 204A TIRONIAN SIGN ET 204B REVERSED PILCROW SIGN 204C BLACK LEFTWARDS BULLET 204D BLACK RIGHTWARDS BULLET 204E LOW ASTERISK 204F REVERSED SEMICOLON 2050 CLOSE UP 2051 TWO ASTERISKS ALIGNED VERTICALLY 2053 SWUNG DASH 2055 FLOWER PUNCTUATION MARK 2056 THREE DOT PUNCTUATION 2057 QUADRUPLE PRIME 2058 FOUR DOT PUNCTUATION 2059 FIVE DOT PUNCTUATION 205A TWO DOT PUNCTUATION 205B FOUR DOT MARK 205C DOTTED CROSS 205D TRICOLON 205E VERTICAL FOUR DOTS 2329 LEFT-POINTING ANGLE BRACKET 232A RIGHT-POINTING ANGLE BRACKET 23B4 TOP SQUARE BRACKET 23B5 BOTTOM SQUARE BRACKET 23B6 BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET 2768 MEDIUM LEFT PARENTHESIS ORNAMENT 2769 MEDIUM RIGHT PARENTHESIS ORNAMENT 276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT 276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT 276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT 276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT 2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT 2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT 27C5 LEFT S-SHAPED BAG DELIMITER 27C6 RIGHT S-SHAPED BAG DELIMITER 27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET 27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET 27E8 MATHEMATICAL LEFT ANGLE BRACKET 27E9 MATHEMATICAL RIGHT ANGLE BRACKET 27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET 2983 LEFT WHITE CURLY BRACKET 2984 RIGHT WHITE CURLY BRACKET 2985 LEFT WHITE PARENTHESIS 2986 RIGHT WHITE PARENTHESIS 2987 Z NOTATION LEFT IMAGE BRACKET 2988 Z NOTATION RIGHT IMAGE BRACKET 2989 Z NOTATION LEFT BINDING BRACKET 298A Z NOTATION RIGHT BINDING BRACKET 298B LEFT SQUARE BRACKET WITH UNDERBAR 298C RIGHT SQUARE BRACKET WITH UNDERBAR 298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER 2991 LEFT ANGLE BRACKET WITH DOT 2992 RIGHT ANGLE BRACKET WITH DOT 2993 LEFT ARC LESS-THAN BRACKET 2994 RIGHT ARC GREATER-THAN BRACKET 2995 DOUBLE LEFT ARC GREATER-THAN BRACKET 2996 DOUBLE RIGHT ARC LESS-THAN BRACKET 2997 LEFT BLACK TORTOISE SHELL BRACKET 2998 RIGHT BLACK TORTOISE SHELL BRACKET 29D8 LEFT WIGGLY FENCE 29D9 RIGHT WIGGLY FENCE 29DA LEFT DOUBLE WIGGLY FENCE 29DB RIGHT DOUBLE WIGGLY FENCE 29FC LEFT-POINTING CURVED ANGLE BRACKET 29FD RIGHT-POINTING CURVED ANGLE BRACKET 2E00 RIGHT ANGLE SUBSTITUTION MARKER 2E01 RIGHT ANGLE DOTTED SUBSTITUTION MARKER 2E02 LEFT SUBSTITUTION BRACKET 2E03 RIGHT SUBSTITUTION BRACKET 2E04 LEFT DOTTED SUBSTITUTION BRACKET 2E05 RIGHT DOTTED SUBSTITUTION BRACKET 2E06 RAISED INTERPOLATION MARKER 2E07 RAISED DOTTED INTERPOLATION MARKER 2E08 DOTTED TRANSPOSITION MARKER 2E09 LEFT TRANSPOSITION BRACKET 2E0A RIGHT TRANSPOSITION BRACKET 2E0B RAISED SQUARE 2E0C LEFT RAISED OMISSION BRACKET 2E0D RIGHT RAISED OMISSION BRACKET 2E0E EDITORIAL CORONIS 2E0F PARAGRAPHOS 2E10 FORKED PARAGRAPHOS 2E11 REVERSED FORKED PARAGRAPHOS 2E12 HYPODIASTOLE 2E13 DOTTED OBELOS 2E14 DOWNWARDS ANCORA 2E15 UPWARDS ANCORA 2E16 DOTTED RIGHT-POINTING ANGLE 2E17 DOUBLE OBLIQUE HYPHEN 2E1C LEFT LOW PARAPHRASE BRACKET 2E1D RIGHT LOW PARAPHRASE BRACKET 3001 IDEOGRAPHIC COMMA 3002 IDEOGRAPHIC FULL STOP 3003 DITTO MARK 3008 LEFT ANGLE BRACKET 3009 RIGHT ANGLE BRACKET 300A LEFT DOUBLE ANGLE BRACKET 300B RIGHT DOUBLE ANGLE BRACKET 300C LEFT CORNER BRACKET 300D RIGHT CORNER BRACKET 300E LEFT WHITE CORNER BRACKET 300F RIGHT WHITE CORNER BRACKET 3010 LEFT BLACK LENTICULAR BRACKET 3011 RIGHT BLACK LENTICULAR BRACKET 3014 LEFT TORTOISE SHELL BRACKET 3015 RIGHT TORTOISE SHELL BRACKET 3016 LEFT WHITE LENTICULAR BRACKET 3017 RIGHT WHITE LENTICULAR BRACKET 3018 LEFT WHITE TORTOISE SHELL BRACKET 3019 RIGHT WHITE TORTOISE SHELL BRACKET 301A LEFT WHITE SQUARE BRACKET 301B RIGHT WHITE SQUARE BRACKET 301C WAVE DASH 301D REVERSED DOUBLE PRIME QUOTATION MARK 301E DOUBLE PRIME QUOTATION MARK 301F LOW DOUBLE PRIME QUOTATION MARK 3030 WAVY DASH FD3E ORNATE LEFT PARENTHESIS FD3F ORNATE RIGHT PARENTHESIS FE45 SESAME DOT FE46 WHITE SESAME DOT -- May the hair on your toes never fall out! John Cowan --Thorin Oakenshield (to Bilbo) xxxxxx@reutershealth.com