unicode-terminal-width and dependence on port encodings
Marc Nieper-WiÃkirchen 28 Feb 2019 07:39 UTC
The function uc_width of GNU libunistring is a function determining
the terminal width of characters, much like what
unicode-terminal-width of SRFI 159 is supposed to do.
Contrary to unicode-terminal-width, uc_width takes a second argument,
namely the encoding used by the terminal (e.g. "UTF-8" or "EUC-JP").
And, indeed, after looking into the source code of uc_width, one sees
that the terminal character width may depend on the encoding:
/* In ancient CJK encodings, Cyrillic and most other characters are
double-width as well. */
if (uc >= 0x00A1 && uc < 0xFF61 && uc != 0x20A9
&& is_cjk_encoding (encoding))
return 2;
http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/uniwidth/width.c#n462
To make unicode-terminal-width work with terminals using these legacy
encodings, it has to know the current encoding, which can be
associated with the port, for example. In any case, the current
encoding should be somehow part of the environment (i.e. the state
variables), in which the formatters are executed.
Unfortunately, unicode-terminal-width as currently specified is not a
monadic procedure and thus has no access to the state variables.
Marc