Re: unicode-terminal-width and dependence on port encodings

Show/hide message thread

unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (28 Feb 2019 07:39 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (28 Feb 2019 11:04 UTC)

Re: unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (28 Feb 2019 11:39 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (08 Mar 2019 07:48 UTC)

Re: unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (08 Mar 2019 08:43 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (08 Mar 2019 12:21 UTC)

Re: unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen 28 Feb 2019 11:39 UTC

Am Do., 28. Feb. 2019 um 12:04 Uhr schrieb Alex Shinn <xxxxxx@gmail.com>:
>
> That doesn't even make sense - Cyrillic is Cyrillic, whatever the encoding is.
> It's a property of the terminal whether or not Cyrillic characters are displayed single-width or double.

Unicode Annex #11 refers to the character encoding when it comes to
deciding the width of ambiguous characters; see definition ED6 here:
https://unicode.org/reports/tr11/.

It seems as if GNU libunistring is implementing that.

> Xterm and gnome-terminal will display Cyrillic as single-width even if set to EUC-JP.
> If I recall correctly, though, kterm would display Cyrillic in double-width (which was really ugly).
> It might be worth looking into a cyrillic-is-double-width state variable, but I'm not sure how popular these terminals still are.

Maybe one state variable for the ambiguous category is enough.

>
> On the other hand, kanji as single width is still a common option in many terminals.
>
> --
> Alex
>
> On Thu, Feb 28, 2019 at 3:39 PM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:
>>
>> The function uc_width of GNU libunistring is a function determining
>> the terminal width of characters, much like what
>> unicode-terminal-width of SRFI 159 is supposed to do.
>>
>> Contrary to unicode-terminal-width, uc_width takes a second argument,
>> namely the encoding used by the terminal (e.g. "UTF-8" or "EUC-JP").
>> And, indeed, after looking into the source code of uc_width, one sees
>> that the terminal character width may depend on the encoding:
>>
>>  /* In ancient CJK encodings, Cyrillic and most other characters are
>>      double-width as well.  */
>>   if (uc >= 0x00A1 && uc < 0xFF61 && uc != 0x20A9
>>       && is_cjk_encoding (encoding))
>>     return 2;
>>
>> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/uniwidth/width.c#n462
>>
>> To make unicode-terminal-width work with terminals using these legacy
>> encodings, it has to know the current encoding, which can be
>> associated with the port, for example. In any case, the current
>> encoding should be somehow part of the environment (i.e. the state
>> variables), in which the formatters are executed.
>>
>> Unfortunately, unicode-terminal-width as currently specified is not a
>> monadic procedure and thus has no access to the state variables.
>>
>> Marc