Email list hosting service & mailing list manager


Re: Strings/chars bear 24 Dec 2003 00:03 UTC


On Tue, 23 Dec 2003, Shiro Kawai wrote:

>From: Michael Sperber <xxxxxx@informatik.uni-tuebingen.de>
>Subject: Re: Strings/chars
>Date: Tue, 23 Dec 2003 11:56:07 +0100
>
>> What's your take on combining characters?
>
>I don't have clear idea at the application level, and can only
>imagine that we need several layers.   As Tom Lord mentioned,
>eventually we'd have such layers, and R5RS character would fade
>away in long long term.
>
>Bear's appoach (as far as I understand, each "character" consists
>of base character + zero or more combining characters; correct me
>if I'm wrong) looks suitable for most linguistic text processing.

That's the basic application I had in mind.  Right now there are
some weird things in the implementation that I don't really know
how to address, such as a combining character, by itself, can
be written using (write) - it comes out #\Uxxxx - but it
can't be (display)ed.  In some sense it's a pseudocharacter,
like control characters and etc, not fully a glyph on its own.
But it's legit enough that most character-type primitives
(char=?, char<? char>? predicates, etc) need to be able to
work on it.

>An application may need more data per character, such as
>how it is represented in the original data, or which language
>it belongs to---it's application dependent, so if we ever want
>to expose it to C FFI, a "character" wouldn't just map to an
>integer; instead, it would be an opaque object with full of APIs
>to extract various information.

Hmmm.  That's probably true.  I'd been pushing the "codepoint"
thing hard, but the more we get into characters, the more they
turn into moderately complicated little bundles.

				Bear