ASCII character classification Lassi Kortela (22 Nov 2019 13:33 UTC)
|
Re: ASCII character classification
John Cowan
(22 Nov 2019 14:52 UTC)
|
Re: ASCII character classification
Lassi Kortela
(22 Nov 2019 19:24 UTC)
|
Re: ASCII character classification
John Cowan
(22 Nov 2019 19:33 UTC)
|
Re: ASCII character classification
Lassi Kortela
(28 Nov 2019 13:57 UTC)
|
Re: ASCII character classification
John Cowan
(28 Nov 2019 14:41 UTC)
|
Should ASCII procedures accept non-ASCII characters?
Lassi Kortela
(28 Nov 2019 15:00 UTC)
|
Re: Should ASCII procedures accept non-ASCII characters?
Lassi Kortela
(28 Nov 2019 15:08 UTC)
|
Re: ASCII character classification
Lassi Kortela
(29 Nov 2019 01:08 UTC)
|
ASCII character classification Lassi Kortela 22 Nov 2019 13:33 UTC
I propose the following as final: (ascii-control? char) -- #x00..#x1f and #x7f, no space (ascii-graphic? char) -- all non-control ascii chars, including space. no other whitespace characters. (ascii-printing? char) -- no such procedure in the SRFI (ascii-display? char) -- no such procedure in the SRFI (ascii-space-or-tab? char) -- does what it says on the tin. alternative names: `ascii-blank?` (too ambiguous) and `ascii-horizontal-whitespace?` (clear, but too long). (ascii-punctuation? char) -- all punctuation and "symbol" characters, as the distinction between those is completely arbitrary to a layperson. (ascii-whitespace? char) -- #x09 (tab), #x0a (line feed), #x0b (vertical tab), #x0c (form feed), #x0d (carriage return), #x20 (space) Also: (ascii-control->graphic char) Convert #x00..#x1f and #x7f to @ A B C ... X Y Z [ \ ] ^ _ and ? (ascii-graphic->control char) Convert in the other direction. The above would match Common Lisp, the Unicode standard and as best I can tell, the spirit of the ASCII standard. It would diverge from SRFI 14 (where "graphic" excludes space, "printing" is graphic + whitespace, the bogus distinction between "punctuation" and "symbol" characters is honored, and the ambiguous term "blank" is used for horizontal whitespace). It would also diverge from the Unix/C standard library (which makes a distinction between "graphic" (no whitespace) and "printing" (graphic + space)). I would prefer the proposal in this mail because it aligns with the character set standards, and aligns with the Common Lisp standard + implementations which IMHO are simpler and more principled in this matter than SRFI 14 (which may or may not be replaced for R7RS-large) and Unix/C. OK?