Re: character strings versus byte strings
tb@xxxxxx 23 Dec 2003 03:29 UTC
bear <xxxxxx@sonic.net> writes:
> Each character is a unicode codepoint plus a non-defective sequence of
> unicode combining codepoints. The unicode documentation refers to these
> entities as "graphemes."
I should revise what I said; there may well be a case for Scheme
characters being graphemes instead of codepoints. I lead toward
codepoints, but I recognize that graphemes are a good contender.
My post was intended to argue against UTF-8; but moving further up the
abstraction ladder than codepoints may well be right.
Thomas