On Fri, Mar 8, 2019 at 4:43 PM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

Regardless, I suggest that the values of that state variable are encodings that (in the sense of the TR) determine the width of the ambiguous characters. That way, interfacing with libraries like GNU libunicode is easier. Also, a fallback is provided if all that is known is the encoding of the terminal used.

Maybe I didn't make the summary clear enough?

The encoding is entirely orthogonal. It was a hack.

It may have had a probabilistic advantage (with false

positives) in the past, but even with a Japanese encoding

now, no modern terminal (OSX terminal, xterm,

gnome-terminal) ever uses double-width for Cyrillic.

Both the TR and libunicode are wrong.

It may be reasonable to leave the default value

unspecified, but suggesting in the SRFI that

encoding has anything to do with character width

is misleading.

Alex

-- Marc

Am Fr., 8. März 2019 um 08:48 Uhr schrieb Alex Shinn <xxxxxx@gmail.com>:
On Thu, Feb 28, 2019 at 7:39 PM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:
Am Do., 28. Feb. 2019 um 12:04 Uhr schrieb Alex Shinn <xxxxxx@gmail.com>:
>
> That doesn't even make sense - Cyrillic is Cyrillic, whatever the encoding is.
> It's a property of the terminal whether or not Cyrillic characters are displayed single-width or double.

Unicode Annex #11 refers to the character encoding when it comes to
deciding the width of ambiguous characters; see definition ED6 here:
https://unicode.org/reports/tr11/.

I looked into this, it appears to be a historic accident.

ASCII characters got special treatment in JIS, but as additional scripts
were added (Cyrillic and Greek), they were just treated as additional
CJK chars with full-width glyphs. In Japan both of these scripts are
primarily used for emoji, however among the few users who actually
want to read and write Cyrillic on Japanese terminals and devices
(myself included) it seems universally considered a misfeature.

The TR may have been trying to reflect the common use of kterm
(which I verified still has that abominable behavior) but I doubt this
is a majority anymore. Considering the encoding of the terminal
seems a total hack. It might be a reasonable proxy given the
predominance of Unicode aware terminals and the fact that anyone
explicitly using an euc-jp environment is more likely to be using
kterm or similar, but I'd rather have a generic state variable.

--
Alex

--
Prof. Dr. Marc Nieper-Wißkirchen

Universität Augsburg
Institut für Mathematik
Universitätsstraße 14
86159 Augsburg

Tel: 0821/598-2146
Fax: 0821/598-2090

E-Mail: xxxxxx@math.uni-augsburg.de
Web: www.math.uni-augsburg.de/alg/mitarbeiter/mnieper/