Removing the ASCII string constants from SRFI 175

Show/hide message thread

SRFI 14 (character sets) needs replacement in R7RS-large John Cowan (19 Sep 2019 23:19 UTC)

Removing the ASCII string constants from SRFI 175 Lassi Kortela (19 Sep 2019 23:38 UTC)

Re: SRFI 14 (character sets) needs replacement in R7RS-large Per Bothner (19 Sep 2019 23:56 UTC)

Re: SRFI 14 (character sets) needs replacement in R7RS-large John Cowan (19 Sep 2019 23:57 UTC)

Removing the ASCII string constants from SRFI 175 Lassi Kortela 19 Sep 2019 23:38 UTC

> This is both wasteful and
> inadequate, but it was the best that could be done portably at the time.

Aptly summarized :)

> Fortunately, the Chibi implementation of SRFI 14 handles full Unicode and
> is built on top of the (chibi iset) library (also available on Chicken),
> which contains a minimal bitvector library that is based on bytevectors.
> It is quite portable and would be suitable for the new SRFI.
>
> Latin-1 is quickly becoming obsolete online, but ASCII is still very
> important.  The Chibi implementation uses a tree of bitvectors whose
> lengths are between 128 and 512 bits each (16 to 64 bytes), so it will be
> as efficient (modulo a small constant factor) in space and time as a
> purpose-built ASCII-only implementation.   Therefore, I recommend that
> everything set-like be removed from SRFI 175.

The only set-like things in SRFI 175 are the four string constants:

- ascii-digits
- ascii-lower-case
- ascii-upper-case
- ascii-punctuation

In addition to those, it has lots of predicate procedures.

The above string constants are meant mainly for tasks like "give me the
range of ASCII letters" to quickly test something, or to create new
character classes saying what constitutes an identifier in some syntax
for example. I guess these tasks can just as well be done with more
generic char-set objects; if so, removing the character classes from
SRFI 175 is fine by me.