Am Di., 23. Apr. 2019 um 17:02 Uhr schrieb Alex Shinn <xxxxxx@gmail.com>:

I apologize for the very long delay.

And I have to apology for the very long delay in responding.

(4) "unicode-terminal-width/jp": I am neither an expert on east Asian
fonts than on unicode" but I am wondering whether characters of
ambiguous width exist only in a Japanese context. Wouldn't
"unicode-terminal-width/cjk" be a better name?

I'm familiar with the Japanese devices and fonts, and you've mentioned libraries which approximate this by checking for Japanese (and not CJK) encodings. I can't find evidence of the problem in Chinese or Korean fonts, but will look around.

Have you found some evidence? The standards I have found only refer to "East Asian" and "CJK".

(7) How are the various "trimmed" procedures supposed to be
implemented in case ANSI escape sequences or Unicode combining
characters are present? (For example, when trimming happens to the
left, the suggested algorithm of taking the longest output does not
yield the correct result.)

In general we can imagine arbitrary compositions that don't work well
together. Trimming in particular is problematic. How should downcase
folding work on a non-trailing Σ (sigma) which is trimmed to become
trailing? This combines two "power" features of the library - suppressing
and rewriting output. We could remove support for such advanced
features altogether, yet they are quite useful.

As the subsequent discussion seems to imply, this problem has a solution. Downcasing (rewriting output) has to happen before trimming (suppressing output). This is something, the user can achieve,.

Should we remove these? I'm considering removing everything but
the base library, on which all else can be built portably anyway.

First of all, I agree with John that everything that is not included in the base library should not be removed.

Secondly, it is not quite true that everything can be built portably on top of the base library (yet). The trimming procedures serve as a good example. At the moment, color procedures working flawlessly together with trimming cannot be implemented.

Assuming we keep them, we should try our best to have sensible rules
so they compose well in most cases, without requiring implementations
to go to unrealistically complex (and slow) efforts.

If things like trimming and colorization (both related to rewriting output) do not go well together, this may be a hint that the underlying model has a flaw.

I think two concepts are mixed up: ASCII/Unicode code values on the one hand side and logical characters on the other hand. The trimming procedures should work on logical characters (interspersed with (ANSI) control sequences). At the moment, however, they work on character codes.

Preserving trailing composing characters is required lest we split
graphemes, hence the rule in the current draft. This should be clarified
for left trimming - we trim the *longest* possible substring from the left.

The result is that given

(trimmed* <width> (as-<color> <stuff> ...))

the result may or may not truncate the left or right color escapes,
changing the intended color but not generating any invalid escapes.
Instead one should always write

(as-<color> (trimmed* <width> <stuff> ...))

This would exclude some important use cases. Consider color-formatted Scheme code that is to be displayed in a fixed width field.

The alternative is for trimmed* to track ANSI color escapes
independently of the color state variables, which is definitely
too complex.

I would propose to add a new state variable that holds a primitive trimming procedure that knows how to remove a fixed number of logical characters from the left and right. Initially, such a trimming procedure could be bound to a simple version that just removes code points. The unicode or the color module can then export a more sophisticated trimming procedure that knows about combining characters and leaves control sequences as it (e.g. "Hello, <red>World!</red>" could be trimmed to 5 characters to "Hello<red></red>".

This would be similar to the procedure "unicode-terminal-width", which is slower than the simple "string-width" procedure, but can be enabled by the user on a case-by-case basis.

Marc