Re: Why are byte ports "ports" as such? Per Bothner 23 May 2006 23:22 UTC

Thomas Bushnell BSG wrote:
> Right.  A text editor needs an input mechanism and a mapping from that
> mechanism to characters.

Not really.  It needs a mapping from input events to actions.  Some
of those actions may be to insert *strings* into the buffer, or to
append string to a search string.

> Except that text is an assemblage of characters, not of code points.
> The editor needs functions like "display this character",

No, it needs functions like "display this string".

> "move to next character",

Yes, but this is no different from "move to next word".  It doesn't
need to work on the character except as *part of the buffer*.

> "tell the user what character this is",

Why?  Not many editors provide this, and in any case it's only
for advanced users.

> and even "convert this character to some standard interchange format".

No, it needs "convert this string/buffer to some standard interchange
format".

> What I want is a *character* type for a text editor.

What you want and what you need are not the same thing.
Somebody who uses a text editor does not need characters;
they need strings.  When you implement a text editor, characters
can be useful, but having them as a separate data type is just
pointless overhead.  You could implement a character using an
interned type, like symbols, and then implementing a string as
a vector a (pointers to) symbols.  But I'm hoping you're not
actually proposing this as a good implementation strategy for
a text editor - or programming language.  However, I totally fail
to guess at what you are proposing.

 > What is *certainly* useless is a "code point" type.

They're useless - except for implementing strings and buffers.

> A text editor need not deal with encodings *at all*.  Think of it: the
> keyboard driver provides characters to the text editor.  Real,
> full-fledged, characters.

No it doesn't - it provides keystroke events.  An "input method"
provides strings in general, not individual characters, since
it may need to do word lookup.

> And the text editor asks the display widget
> to display a character in a particular font and context (since some
> characters have different glyphs).

As I said: The display widget works with *strings*, not characters.

> But never does it really care about encodings.

Of course it does.  Fonts are indexed by code-point.

> Think of it this way: an editor should not even *care* what the
> underlying encodings are for characters; it should be entirely
> irrelevant.

Well, at some point you're going to have to ask "is this a digit"
or "is this a space".  To do that correctly and portably, you need
to index the Unicode tables, which are indexed by code-points.  Of
course that is rather low-level: instead, I'm arguing for an api
like "is the character at this position in this string/buffer
white-space".  This is a special case of the more general: "does
the substring after this position match this regular expression".

Anyway, this is all irrelevant.  Until you specify an actual
"character API" and propose a practical implementation strategy,
then I think the discussion is pointless.
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/