ASCII character classification
Lassi Kortela 22 Nov 2019 13:33 UTC
I propose the following as final:
(ascii-control? char) -- #x00..#x1f and #x7f, no space
(ascii-graphic? char) -- all non-control ascii chars, including space.
no other whitespace characters.
(ascii-printing? char) -- no such procedure in the SRFI
(ascii-display? char) -- no such procedure in the SRFI
(ascii-space-or-tab? char) -- does what it says on the tin. alternative
names: `ascii-blank?` (too ambiguous) and `ascii-horizontal-whitespace?`
(clear, but too long).
(ascii-punctuation? char) -- all punctuation and "symbol" characters, as
the distinction between those is completely arbitrary to a layperson.
(ascii-whitespace? char) -- #x09 (tab), #x0a (line feed), #x0b
(vertical tab), #x0c (form feed), #x0d (carriage return), #x20 (space)
Also:
(ascii-control->graphic char)
Convert #x00..#x1f and #x7f to @ A B C ... X Y Z [ \ ] ^ _ and ?
(ascii-graphic->control char)
Convert in the other direction.
The above would match Common Lisp, the Unicode standard and as best I
can tell, the spirit of the ASCII standard.
It would diverge from SRFI 14 (where "graphic" excludes space,
"printing" is graphic + whitespace, the bogus distinction between
"punctuation" and "symbol" characters is honored, and the ambiguous term
"blank" is used for horizontal whitespace).
It would also diverge from the Unix/C standard library (which makes a
distinction between "graphic" (no whitespace) and "printing" (graphic +
space)).
I would prefer the proposal in this mail because it aligns with the
character set standards, and aligns with the Common Lisp standard +
implementations which IMHO are simpler and more principled in this
matter than SRFI 14 (which may or may not be replaced for R7RS-large)
and Unix/C.
OK?