Re: unicode-terminal-width and dependence on port encodings
Marc Nieper-WiÃkirchen 28 Feb 2019 11:39 UTC
Am Do., 28. Feb. 2019 um 12:04 Uhr schrieb Alex Shinn <xxxxxx@gmail.com>:
>
> That doesn't even make sense - Cyrillic is Cyrillic, whatever the encoding is.
> It's a property of the terminal whether or not Cyrillic characters are displayed single-width or double.
Unicode Annex #11 refers to the character encoding when it comes to
deciding the width of ambiguous characters; see definition ED6 here:
https://unicode.org/reports/tr11/.
It seems as if GNU libunistring is implementing that.
> Xterm and gnome-terminal will display Cyrillic as single-width even if set to EUC-JP.
> If I recall correctly, though, kterm would display Cyrillic in double-width (which was really ugly).
> It might be worth looking into a cyrillic-is-double-width state variable, but I'm not sure how popular these terminals still are.
Maybe one state variable for the ambiguous category is enough.
>
> On the other hand, kanji as single width is still a common option in many terminals.
>
> --
> Alex
>
> On Thu, Feb 28, 2019 at 3:39 PM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:
>>
>> The function uc_width of GNU libunistring is a function determining
>> the terminal width of characters, much like what
>> unicode-terminal-width of SRFI 159 is supposed to do.
>>
>> Contrary to unicode-terminal-width, uc_width takes a second argument,
>> namely the encoding used by the terminal (e.g. "UTF-8" or "EUC-JP").
>> And, indeed, after looking into the source code of uc_width, one sees
>> that the terminal character width may depend on the encoding:
>>
>> /* In ancient CJK encodings, Cyrillic and most other characters are
>> double-width as well. */
>> if (uc >= 0x00A1 && uc < 0xFF61 && uc != 0x20A9
>> && is_cjk_encoding (encoding))
>> return 2;
>>
>> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/uniwidth/width.c#n462
>>
>> To make unicode-terminal-width work with terminals using these legacy
>> encodings, it has to know the current encoding, which can be
>> associated with the port, for example. In any case, the current
>> encoding should be somehow part of the environment (i.e. the state
>> variables), in which the formatters are executed.
>>
>> Unfortunately, unicode-terminal-width as currently specified is not a
>> monadic procedure and thus has no access to the state variables.
>>
>> Marc