Re: character strings versus byte strings

Re: character strings versus byte strings tb@xxxxxx 23 Dec 2003 03:29 UTC

bear <xxxxxx@sonic.net> writes:

> Each character is a unicode codepoint plus a non-defective sequence of
> unicode combining codepoints.  The unicode documentation refers to these
> entities as "graphemes."

I should revise what I said; there may well be a case for Scheme
characters being graphemes instead of codepoints.  I lead toward
codepoints, but I recognize that graphemes are a good contender.

My post was intended to argue against UTF-8; but moving further up the
abstraction ladder than codepoints may well be right.

Thomas