Thomas Bushnell BSG wrote:
> Right. A text editor needs an input mechanism and a mapping from that
> mechanism to characters.
Not really. It needs a mapping from input events to actions. Some
of those actions may be to insert *strings* into the buffer, or to
append string to a search string.
> Except that text is an assemblage of characters, not of code points.
> The editor needs functions like "display this character",
No, it needs functions like "display this string".
> "move to next character",
Yes, but this is no different from "move to next word". It doesn't
need to work on the character except as *part of the buffer*.
> "tell the user what character this is",
Why? Not many editors provide this, and in any case it's only
for advanced users.
> and even "convert this character to some standard interchange format".
No, it needs "convert this string/buffer to some standard interchange
format".
> What I want is a *character* type for a text editor.
What you want and what you need are not the same thing.
Somebody who uses a text editor does not need characters;
they need strings. When you implement a text editor, characters
can be useful, but having them as a separate data type is just
pointless overhead. You could implement a character using an
interned type, like symbols, and then implementing a string as
a vector a (pointers to) symbols. But I'm hoping you're not
actually proposing this as a good implementation strategy for
a text editor - or programming language. However, I totally fail
to guess at what you are proposing.
> What is *certainly* useless is a "code point" type.
They're useless - except for implementing strings and buffers.
> A text editor need not deal with encodings *at all*. Think of it: the
> keyboard driver provides characters to the text editor. Real,
> full-fledged, characters.
No it doesn't - it provides keystroke events. An "input method"
provides strings in general, not individual characters, since
it may need to do word lookup.
> And the text editor asks the display widget
> to display a character in a particular font and context (since some
> characters have different glyphs).
As I said: The display widget works with *strings*, not characters.
> But never does it really care about encodings.
Of course it does. Fonts are indexed by code-point.
> Think of it this way: an editor should not even *care* what the
> underlying encodings are for characters; it should be entirely
> irrelevant.
Well, at some point you're going to have to ask "is this a digit"
or "is this a space". To do that correctly and portably, you need
to index the Unicode tables, which are indexed by code-points. Of
course that is rather low-level: instead, I'm arguing for an api
like "is the character at this position in this string/buffer
white-space". This is a special case of the more general: "does
the substring after this position match this regular expression".
Anyway, this is all irrelevant. Until you specify an actual
"character API" and propose a practical implementation strategy,
then I think the discussion is pointless.
--
--Per Bothner
xxxxxx@bothner.com http://per.bothner.com/