Email list hosting service & mailing list manager

constant-time access to variable-width encodings Per Bothner (13 Jul 2005 18:13 UTC)
Re: constant-time access to variable-width encodings Ray Blaak (13 Jul 2005 18:48 UTC)
Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 20:16 UTC)
Re: constant-time access to variable-width encodings Per Bothner (13 Jul 2005 20:36 UTC)
Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 23:07 UTC)
Re: constant-time access to variable-width encodings Per Bothner (14 Jul 2005 00:39 UTC)
Re: constant-time access to variable-width encodings bear (14 Jul 2005 01:52 UTC)
Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:18 UTC)
Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:16 UTC)

Re: constant-time access to variable-width encodings bear 14 Jul 2005 01:52 UTC


On Wed, 13 Jul 2005, Per Bothner wrote:

>bear wrote:

>> Aaaand, this is yet another problem that goes away if you embrace
>> glyph=character instead of codepoint=character.

> huh?  A glyph depends on a specific font.  No way can we define Scheme
> characters in terms of glyphs.
>
> Do you mean a (canonicalized) composite (combining) sequence?

Yes, that's what I mean.

> One
> problem is you can't practially map one of those to a fixed-length
> integer value, so we have to give up char->integer and integer->char.

Or allow them to accept/return bignums, or limit their ranges. Point.
I think bignums in these routines are a much smaller sacrifice to
consistency than others being discussed here.  Implementations which
do not support bignums may report a violation of an implementation
restriction, I guess.

> Also, if equal characters are to be eq? they would have to interned,
> like strings.  Both of these chanegs are possible, but a rather radical
> (and unneeded departure) from current practice.

But strings aren't interned, they're just boxed.  You mean symbols,
don't you?  (symbols have to guarantee eq?-ness; strings can be eqv?
without being eq?.)  OTOH, multi-codepoint characters would have to be
boxed; if you want to guarantee eq?ness you'd have to maintain a
global character table, similar to the global symbol table, which I
think is what you mean by "interned."  And this implies that if you
wanted to be able to garbage-collect characters, ever, you'd have to
support soft pointers in your garbage collector and make the global
character table refer to the boxed entities using them (just like
symbols that way).

I think it would be reasonable to have eq? no longer guaranteed on
multi-codepoint characters; use eqv? (or better yet, char=?) instead.

				Bear