Email list hosting service & mailing list manager

constant-time access to variable-width encodings Per Bothner (13 Jul 2005 18:13 UTC)
Re: constant-time access to variable-width encodings Ray Blaak (13 Jul 2005 18:48 UTC)
Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 20:16 UTC)
Re: constant-time access to variable-width encodings Per Bothner (13 Jul 2005 20:36 UTC)
Re: constant-time access to variable-width encodings Shiro Kawai (13 Jul 2005 23:07 UTC)
Re: constant-time access to variable-width encodings Per Bothner (14 Jul 2005 00:39 UTC)
Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:18 UTC)
Re: constant-time access to variable-width encodings Thomas Bushnell BSG (14 Jul 2005 07:16 UTC)

Re: constant-time access to variable-width encodings Per Bothner 14 Jul 2005 00:39 UTC

bear wrote:
> Aaaand, this is yet another problem that goes away if you embrace
> glyph=character instead of codepoint=character.

Huh?  A glyph depends on a specific font.  No way can we define Scheme
characters in terms of glyphs.

Do you mean a (canonicalized) composite (combining) sequence?  One
problem is you can't practially map one of those to a fixed-length
integer value, so we have to give up char->integer and integer->char.
Also, if equal characters are to be eq? they would have to interned,
like strings.  Both of these chanegs are possible, but a rather radical
(and unneeded departure) from current practice.

> With Unicode,
> you *CANNOT* make assumptions about how strings are represented.
> Two strings which are "equal" under unicode's required
> equivalence predicates may be of different lengths and have not a
> single codepoint in common, and the differences are purely
> representation artifacts.

Nonetheless, Java defines the Strings equals routine in terms of code
point equality, and Java programmers manage to get useful work done.
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/