unicode-terminal-width and dependence on port encodings

Show/hide message thread

unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (28 Feb 2019 07:39 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (28 Feb 2019 11:04 UTC)

Re: unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (28 Feb 2019 11:39 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (08 Mar 2019 07:48 UTC)

Re: unicode-terminal-width and dependence on port encodings Marc Nieper-Wißkirchen (08 Mar 2019 08:43 UTC)

Re: unicode-terminal-width and dependence on port encodings Alex Shinn (08 Mar 2019 12:21 UTC)

unicode-terminal-width and dependence on port encodings Marc Nieper-WiÃkirchen 28 Feb 2019 07:39 UTC

The function uc_width of GNU libunistring is a function determining
the terminal width of characters, much like what
unicode-terminal-width of SRFI 159 is supposed to do.

Contrary to unicode-terminal-width, uc_width takes a second argument,
namely the encoding used by the terminal (e.g. "UTF-8" or "EUC-JP").
And, indeed, after looking into the source code of uc_width, one sees
that the terminal character width may depend on the encoding:

 /* In ancient CJK encodings, Cyrillic and most other characters are
     double-width as well.  */
  if (uc >= 0x00A1 && uc < 0xFF61 && uc != 0x20A9
      && is_cjk_encoding (encoding))
    return 2;

http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/uniwidth/width.c#n462

To make unicode-terminal-width work with terminals using these legacy
encodings, it has to know the current encoding, which can be
associated with the port, for example. In any case, the current
encoding should be somehow part of the environment (i.e. the state
variables), in which the formatters are executed.

Unfortunately, unicode-terminal-width as currently specified is not a
monadic procedure and thus has no access to the state variables.

Marc