Email list hosting service & mailing list manager

ASCII character classification Lassi Kortela (22 Nov 2019 13:33 UTC)
Re: ASCII character classification John Cowan (22 Nov 2019 14:52 UTC)
Re: ASCII character classification Lassi Kortela (22 Nov 2019 19:24 UTC)
Re: ASCII character classification John Cowan (22 Nov 2019 19:33 UTC)
Re: ASCII character classification Lassi Kortela (28 Nov 2019 13:57 UTC)
Re: ASCII character classification John Cowan (28 Nov 2019 14:41 UTC)
Should ASCII procedures accept non-ASCII characters? Lassi Kortela (28 Nov 2019 15:00 UTC)
Re: Should ASCII procedures accept non-ASCII characters? Lassi Kortela (28 Nov 2019 15:08 UTC)
Re: ASCII character classification Lassi Kortela (29 Nov 2019 01:08 UTC)

ASCII character classification Lassi Kortela 22 Nov 2019 13:33 UTC

I propose the following as final:

(ascii-control? char) -- #x00..#x1f and #x7f, no space

(ascii-graphic? char) -- all non-control ascii chars, including space.
no other whitespace characters.

(ascii-printing? char) -- no such procedure in the SRFI

(ascii-display? char) -- no such procedure in the SRFI

(ascii-space-or-tab? char) -- does what it says on the tin. alternative
names: `ascii-blank?` (too ambiguous) and `ascii-horizontal-whitespace?`
(clear, but too long).

(ascii-punctuation? char) -- all punctuation and "symbol" characters, as
the distinction between those is completely arbitrary to a layperson.

(ascii-whitespace? char) --  #x09 (tab), #x0a (line feed), #x0b
(vertical tab), #x0c (form feed), #x0d (carriage return), #x20 (space)

Also:

(ascii-control->graphic char)

Convert #x00..#x1f and #x7f to @ A B C ... X Y Z [ \ ] ^ _ and ?

(ascii-graphic->control char)

Convert in the other direction.

The above would match Common Lisp, the Unicode standard and as best I
can tell, the spirit of the ASCII standard.

It would diverge from SRFI 14 (where "graphic" excludes space,
"printing" is graphic + whitespace, the bogus distinction between
"punctuation" and "symbol" characters is honored, and the ambiguous term
"blank" is used for horizontal whitespace).

It would also diverge from the Unix/C standard library (which makes a
distinction between "graphic" (no whitespace) and "printing" (graphic +
space)).

I would prefer the proposal in this mail because it aligns with the
character set standards, and aligns with the Common Lisp standard +
implementations which IMHO are simpler and more principled in this
matter than SRFI 14 (which may or may not be replaced for R7RS-large)
and Unix/C.

OK?