Re: Why are byte ports "ports" as such? John Cowan 24 May 2006 15:50 UTC

Thomas Bushnell BSG scripsit:

> I mean [by "character"] what the Unicode specification refers to as an "abstract
> character".  Encoding a character with Unicode may take one or more
> code points.  It is a very close concept to a grapheme, with the
> caveat that "grapheme" is relative to a specific writing system, and
> "abstract character" is supposed to be an interlingual thing.

The Unicode Standard 4.0, section 3.4, definition D3 says:

# Abstract character: A unit of information used for the organization,
# control, or representation of textual data.

Forgive me if I find that vague.

TUS further adds:

#	When representing data, the nature of that data is generally
#	symbolic as opposed to he some other kind of data (for example,
#	aural or visual). Examples of such symbolic data include letters,
#	ideographs, digits, punctuation, technical symbols, and dingbats.
#	An abstract character has no concrete form and should not be
#	confused with a glyph.
#	An abstract character does not necessarily correspond to what
#	a user thinks of as a ``character'' and should not be confused
#	with a grapheme.
#	The abstract characters encoded by the Unicode Standard are
#	known as Unicode abstract characters.
#	Abstract characters not directly encoded by the Unicode
#	Standard can often be represented by the use of combining
#	character sequences.

All of which makes abstract characters sound rather ... abstract.

> Who said it was different?  What is perfectly clear however, is that
> the one operation that is utterly useless is iterating by code point.

Obviously not.  Lots of people iterate by codepoint or even by code
unit.  (I meet your dogmatism with my counter-dogmatism.)

John Cowan                    
Humpty Dump Dublin squeaks through his norse
                Humpty Dump Dublin hath a horrible vorse
But for all his kinks English / And his irismanx brogues
                Humpty Dump Dublin's grandada of all rogues.  --Cousin James