On Sat, Jun 29, 2019 at 8:21 AM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

I think two concepts are mixed up: ASCII/Unicode code values on the one hand side and logical characters on the other hand.

The trouble is that Unicode makes it clear that there is no such thing as a "logical character".  Here's a writeup I did for LtU:

Not all scripts have as well-defined a notion of characters as Latin or Han. For example, there are three human-oriented ways to view Devanagari text: (1) the phonemic level, where letters (consonants and initial vowels) and vowel signs are separate units; (2) the default grapheme cluster level, where a letter with or without a vowel sign is the unit; (3) the akshara level, where a (possibly zero-length) sequence of graphically reduced letters followed by a fully written letter followed by an optional vowel sign or virama (vowel suppressor) is the unit. None of these is inherently superior to any other, and what people see as "a character" depends on their purpose at the time. Furthermore, none of these three levels is equivalent to the codepoint level, but all of them can be constructed on top of it.

Consider Devanagari क्न्य knya. At the akshara level, it's a unitary character. At the default grapheme level it's three characters क् k + न् n + य ya. At the letter level it's क ka + ् vowel killer + न na+ ् again plus य ya. Visually, the क gets reduced to just its left half and the न to a squiggle. You type at the letter level, so when you hit backspace, the rightmost letter disappears. But when you navigate through the text using the arrow keys or the mouse, you can't select in the middle of an akshara: when the character appears as क्न्य, it is just one letter; when it appears in the equivalent (but less legible) way as क्‌न्‌य, it is three.

Even the familiar Latin script is not so simple as it seems. In English æ is a mere typographical ligature, and it is all one whether you write Caesar or Cæsar, but in Norwegian æ is a separate letter, not interchangeable with ae. By the same token, sœur is the normative French spelling of the word for 'sister', but it is commonplace to write soeur nowadays because œ got squeezed out of the Latin-1 character set; however, moelle 'marrow' cannot be spelled mœlle (it is pronounced as if written moile). In German and French respectively, ä and é require their accents, but don't constitute separate letters of the alphabet, whereas in Swedish and Icelandic they are as separate as i and j are in English (but not in Italian); English ö in coöperate is a typographical nicety used by The New Yorker but hardly anyone else nowadays.