On Thu, Sep 19, 2019 at 7:56 PM Per Bothner <xxxxxx@bothner.com> wrote:

On 9/19/19 4:19 PM, John Cowan wrote:
> The Chibi implementation uses a tree of bitvectors whose lengths are between 128 and 512 bits each (16 to 64 bytes), so it will be as efficient (modulo a small constant factor) in space and time as a purpose-built ASCII-only implementation.

You might also want to take a look at the Kawa implementation of srfi-14.
The implementation is written by Jamison Hope and uses an interesting
data structure: inversion lists. This is very compact and extremely cache-friendly
(binary search in a linear integer array).

The code is gnu/kawa/slib/srfi14.scm in the Kawa sources
(https://gitlab.com/kashell/Kawa), while the code to generate
the Unicode tables (at build-time) is gnu/kawa/util/generate-charsets.scm.

The code is highly non-portable (because it uses Kawa classes and other features),
but it should be straightforward to convert it into something more portable.
--
--Per Bothner
xxxxxx@bothner.com http://per.bothner.com/