The easiest way to read the CharsetDefs file is at this URL:


On Mon, Dec 9, 2019 at 1:38 PM Arthur A. Gleckler <> wrote:
Back in April, John pointed out that SRFI 14 Charset Definitions is based on old definitions from Java 1.0, and that Unicode has changed quite a bit since then:

John has since proposed some revisions to SRFI 14 to bring it up to date with modern Unicode, but we haven't been able to get Olin Shivers, the SRFI author, to weigh in, so I've added a link to John's notes as a "post-finalization note," not an erratum, in the Status section.

Here's John's description of the proposed revisions:
Unicode, Latin-1 and ASCII definitions of the standard character sets section below reflect Java 1.0, which in turn reflects Unicode 2.0.  Unicode's definitions of these groups of characters has been substantially revised and updated since then, although the ASCII and Latin-1 definitions are frozen and will always be correct.
This note recommends that implementers of new implementations and maintainers of existing ones update their implementations to use the current Unicode definitions, as detailed in the supplementary CharsetDefinitions file.