constant-time access to variable-width encodings
Per Bothner
(13 Jul 2005 18:13 UTC)
|
Re: constant-time access to variable-width encodings
Ray Blaak
(13 Jul 2005 18:48 UTC)
|
Re: constant-time access to variable-width encodings
Shiro Kawai
(13 Jul 2005 20:16 UTC)
|
Re: constant-time access to variable-width encodings
Per Bothner
(13 Jul 2005 20:36 UTC)
|
Re: constant-time access to variable-width encodings
Shiro Kawai
(13 Jul 2005 23:07 UTC)
|
Re: constant-time access to variable-width encodings
bear
(14 Jul 2005 00:23 UTC)
|
Re: constant-time access to variable-width encodings
Per Bothner
(14 Jul 2005 00:39 UTC)
|
Re: constant-time access to variable-width encodings bear (14 Jul 2005 01:52 UTC)
|
Re: constant-time access to variable-width encodings
Thomas Bushnell BSG
(14 Jul 2005 07:18 UTC)
|
Re: constant-time access to variable-width encodings
Thomas Bushnell BSG
(14 Jul 2005 07:16 UTC)
|
On Wed, 13 Jul 2005, Per Bothner wrote: >bear wrote: >> Aaaand, this is yet another problem that goes away if you embrace >> glyph=character instead of codepoint=character. > huh? A glyph depends on a specific font. No way can we define Scheme > characters in terms of glyphs. > > Do you mean a (canonicalized) composite (combining) sequence? Yes, that's what I mean. > One > problem is you can't practially map one of those to a fixed-length > integer value, so we have to give up char->integer and integer->char. Or allow them to accept/return bignums, or limit their ranges. Point. I think bignums in these routines are a much smaller sacrifice to consistency than others being discussed here. Implementations which do not support bignums may report a violation of an implementation restriction, I guess. > Also, if equal characters are to be eq? they would have to interned, > like strings. Both of these chanegs are possible, but a rather radical > (and unneeded departure) from current practice. But strings aren't interned, they're just boxed. You mean symbols, don't you? (symbols have to guarantee eq?-ness; strings can be eqv? without being eq?.) OTOH, multi-codepoint characters would have to be boxed; if you want to guarantee eq?ness you'd have to maintain a global character table, similar to the global symbol table, which I think is what you mean by "interned." And this implies that if you wanted to be able to garbage-collect characters, ever, you'd have to support soft pointers in your garbage collector and make the global character table refer to the boxed entities using them (just like symbols that way). I think it would be reasonable to have eq? no longer guaranteed on multi-codepoint characters; use eqv? (or better yet, char=?) instead. Bear