Re: constant-time access to variable-width encodings
Per Bothner 14 Jul 2005 00:39 UTC
bear wrote:
> Aaaand, this is yet another problem that goes away if you embrace
> glyph=character instead of codepoint=character.
Huh? A glyph depends on a specific font. No way can we define Scheme
characters in terms of glyphs.
Do you mean a (canonicalized) composite (combining) sequence? One
problem is you can't practially map one of those to a fixed-length
integer value, so we have to give up char->integer and integer->char.
Also, if equal characters are to be eq? they would have to interned,
like strings. Both of these chanegs are possible, but a rather radical
(and unneeded departure) from current practice.
> With Unicode,
> you *CANNOT* make assumptions about how strings are represented.
> Two strings which are "equal" under unicode's required
> equivalence predicates may be of different lengths and have not a
> single codepoint in common, and the differences are purely
> representation artifacts.
Nonetheless, Java defines the Strings equals routine in terms of code
point equality, and Java programmers manage to get useful work done.
--
--Per Bothner
xxxxxx@bothner.com http://per.bothner.com/