Re: Why are byte ports "ports" as such? Thomas Bushnell BSG 23 May 2006 22:07 UTC

Per Bothner <xxxxxx@bothner.com> writes:

> You need keyboard *events*, for input, but they're obviously quite
> different from characters.  (They have modifier bits, plus maybe
> separate up/down events.)

Right.  A text editor needs an input mechanism and a mapping from that
mechanism to characters.  It does not care about code points, except
in the case that the input method happens to be similar to code
points.

> You need to store and modify text data in *buffers*, but there is no
> need for characters as a separate data type.  You do need functions to
> "move to the next character/word/line/paragraph", but again these work
> in characters in a buffer, not individual characters.

Except that text is an assemblage of characters, not of code points.
The editor needs functions like "display this character", "move to
next character", "tell the user what character this is", and even
"convert this character to some standard interchange format".

> You need be able to display the text, which involves searching for
> glyphs in fonts, possibly performing kerning, line-breaking, etc.
> Again, a character data type will probably be either too high-level
> or too low-level. You certainly don't want to have it built-in to
> your programming language, but into your display software.

What I want is a *character* type for a text editor.

What is *certainly* useless is a "code point" type.

What is perfectly perverse is taking a code point type and *labelling*
it character.

If you don't want a character data type, then you don't have to have
one as far as I'm concerned.  But those of us who do want one, can we
please call it "character"?  And you, with your preferred "code point"
type (which *is* useful for some applications, namely, those which are
unicode-specific) can have that, and please call it "code-point".

A text editor need not deal with encodings *at all*.  Think of it: the
keyboard driver provides characters to the text editor.  Real,
full-fledged, characters.  And the text editor asks the display widget
to display a character in a particular font and context (since some
characters have different glyphs).  But never does it really care
about encodings.

It is only an editor like emacs that must spend all this energy on
encodings, because it is really a byte editor being torqued into use
as a text editor.

Think of it this way: an editor should not even *care* what the
underlying encodings are for characters; it should be entirely
irrelevant.

Thomas