24 Dec 2003

On Tue, 23 Dec 2003, Shiro Kawai wrote:

From: Michael Sperber
Subject: Re: Strings/chars
>Date: Tue, 23 Dec 2003 11:56:07 +0100
>> What's your take on combining characters?
>I don't have clear idea at the application level, and can only
>imagine that we need several layers.   As Tom Lord mentioned,
>eventually we'd have such layers, and R5RS character would fade
>away in long long term.
>Bear's appoach (as far as I understand, each "character" consists
>of base character + zero or more combining characters; correct me
>if I'm wrong) looks suitable for most linguistic text processing.

That's the basic application I had in mind.  Right now there are
some weird things in the implementation that I don't really know
how to address, such as a combining character, by itself, can
be written using (write) - it comes out #\Uxxxx - but it
can't be (display)ed.  In some sense it's a pseudocharacter,
like control characters and etc, not fully a glyph on its own.
But it's legit enough that most character-type primitives
(char=?, char<? char>? predicates, etc) need to be able to
work on it.

>An application may need more data per character, such as
>how it is represented in the original data, or which language
>it belongs to---it's application dependent, so if we ever want
>to expose it to C FFI, a "character" wouldn't just map to an
>integer; instead, it would be an opaque object with full of APIs
>to extract various information.

Hmmm.  That's probably true.  I'd been pushing the "codepoint"
thing hard, but the more we get into characters, the more they
turn into moderately complicated little bundles.