Re: Why are byte ports "ports" as such? Per Bothner 23 May 2006 20:08 UTC

Jonathan S. Shapiro wrote:
> On Tue, 2006-05-23 at 11:57 -0700, Per Bothner wrote:
>> What is the use-case for read-char, as you define it?
>> What is the use-case for a "character" data type that is
>> *not* a codepoint data type?
> We are getting to the jagged edge of what I know about UNICODE,

A little knowledge is a dangerous thing ...

> but here is the situation as I understand it.
> The underlying issue within UNICODE is the existence of the so-called
> "combining characters". There exist characters that have no single
> defining codepoint. These exist primarily in Asian languages, for
> example in the form of multiple code points that together form a single
> "glyph".

You're using the wrong terminology here, I think, but never mind.

> The use case, then, seems self evident: programs that must be aware of
> these at the code-point level.

You're contradicting yourself: I asked about a use-case for *character*
as a separate *data type*.

You given no such use-case.

> The codepoint==char presumption is simply untrue in some non-western
> languages.

We know that.  However, there is still no need for "character" [in the
Unicode sense] as a separate data type:

Code that works on compound characters *as a unit* can and should use a
string type.  Code that needs to look *inside* a compound character,
needs to works with codepoints.

In Java, "character" is actually a Unicode code-point.  This is how it
should be in Scheme, though we might want to replace the 16-bit size
by a 20-bit size to avoid the complexities of surrogate characters.
	--Per Bothner