On Tue, 10 Feb 2004, Tom Lord wrote:
>Programmers building global computing environments have need for
>certain categories of characters which historically, are of little or
>no interest to linguists. One of these categories is comprised of
>many of the characters used to write words, whether those characters
>are alphabetic, syllabic, or ideographic. Linguistics hasn't given
>us a term for that category.
Well, actually it has. Linguists call these categories "glyphs" or
"graphemes", usually with varying degrees of precision or varying
exact meaning depending on the context or speaker. Generally
"grapheme" is considered less specific than "glyph", in the sense that
many different arrangements of ink on paper (or notches in bark, or
carved grooves in stone or whatever) can represent the same grapheme
while each minor variation creates a different glyph.
For example "A" is a grapheme; A particular example of a lucida
ten-point sans-serif A printed on a particular piece of paper is a
glyph.
But the line between them is blurry. Is the bit-pattern a glyph before
it's printed? Is a font containing the patterns for the glyphs that
will be printed when it's used composed of glyphs, or graphemes?
etc... there's a big fuzzy area between the concrete realization of a
particular instance of a grapheme (unambiguously a glyph) and the
abstract idea of a minimal written unit of language or printed
communication (unambiguously a grapheme) and we tend to redraw the
line between them depending on exactly which levels of abstraction we
need to distinguish between for a particular application.
>In their wisdom (or absense of wisdom) the Unicode consortium chose a
>name for this category: they call these characters "letters".
They were starved for names. They were already using "grapheme" and
"glyph" in ways that didn't allow their reuse. Besides, they wanted
to exclude some categories which are graphemes, such as punctuation.
> That
> _is_ an overloading of the term "letter" -- but it is an overloading
> that pervades the Unicode specifications and data tables. For
> example, every assigned Unicode codepoint has a property called "the
> major class of its General Category". The class of alphabetic,
> syllabic, and ideographic characters has the major class "L" (short
> for "letter"). The glossary of the Unicode 3.0 specification says:
>
> Letter. (1) An element of an alphabet. In a broad sense,
> includes elements of syllabaries and ideographs. (2)
> Informative property of characters that are used to write
> words.
>
> I believe that this "broad sense" meaning of "Letter" is well
> engrained in computing and that it _is_ the right term for the
> concept that SRFI-52 is attempting to convey.
Probably; language changes, and "letter" is rapidly coming to have the
broader meaning. It will probably be correct usage no later than
"alot of", and in the meantime will probably cause fewer people to
grind their teeth. Besides, it's not the first time computer
programmers take a word in use and create a restricted, technical
definition for it that's not quite exactly the same as the people
using it understand.
Bear