Re: test-apply and on-test-begin callback

Show/hide message thread
test-apply and on-test-begin callback Tomas Volf (24 Jul 2024 21:32 UTC)
Re: test-apply and on-test-begin callback Per Bothner (24 Jul 2024 22:10 UTC)
Re: test-apply and on-test-begin callback Tomas Volf (25 Jul 2024 13:31 UTC)
Re: test-apply and on-test-begin callback Tomas Volf 25 Jul 2024 13:31 UTC
Thank you very much for your response.

To keep it succinct, in the rest of the message `skip list' refers to "currently
active skip specifiers" and `run list' refers to specifiers passed to the
test-apply (naming taken from the reference implementation).

On 2024-07-24 15:10:18 -0700, Per Bothner wrote:
> I'm not sure what I intended for test-apply.
> I think it is meant as a top-level tool for running a test-suite,
> and thus "currently active skip specifiers" would normally not be a factor.
> However, I suppose it could be used in more complex situations; I just
> don't remember if I had anything in mind.

I acknowledge the re-use of stateful specifiers across both skip list and run
list is quite an edge case, and the specification just saying order is
unspecified would be fine with me.

I have no immediate use case to share the specifiers, but as I am writing an
implementation of SRFI-64, I sadly need to think about what user might want to
do (and is allowed to do by specification), not what I would consider sane.

>
> In this case I think "the reference implementation is the specification,
> if not explicitly contradicted". Assuming you can figure out whet the
> reference implementation does. (Doesn't seem that complicated,
> but I haven't dug into it in a long time.)

Per your advice I took the reference implementation and tried to verify its
behavior (without looking at the source too closely, would like to avoid
copyright taint).

It seems that the SRFI-64 distributed with GNU Guile is based on the reference
implementation (not sure if any changes were made), so I know that it does not
adhere to the specification all that well (that is reason I started writing my
own implementation, as GNU Guile user, I wanted to have SRFI-64 compliant
library available), but here I will focus just on test-apply (and only on the
part relevant to my original question, there is no need to dig into the cases of
non-compliance).

I used the following test program (`pk' just prints its arguments):

    (let ((r (test-runner-null)))
      (test-runner-on-test-begin! r (λ (r)
                                      (pk 'on-test-begin
                                          'name (test-runner-test-name r)
                                          'kind (test-result-kind r))))
      (test-runner-on-test-end!   r (λ (r)
                                      (pk 'on-test-begin
                                          'name (test-runner-test-name r)
                                          'kind (test-result-kind r))))
      (test-apply r (test-match-name "test-a") (λ ()
                                                 (test-begin "xx")
                                                 (test-assert "test-a" #t)
                                                 (pk (test-result-kind))
                                                 (test-assert "test-b" #t)
                                                 (pk (test-result-kind))
                                                 (test-end))))

I got this output:

    ;;; WARNING: compilation of /home/wolf/Downloads/testing.scm failed:
    ;;; Unbound variable: %test-source-line2

    ;;; (on-test-begin name "test-a" kind #f)

    ;;; (on-test-begin name "test-a" kind pass)

    ;;; (pass)

    ;;; (on-test-begin name "test-b" kind skip)

    ;;; (on-test-begin name "test-b" kind skip)

    ;;; (skip)

Ignoring the failed compilation (as I said, I do not want to dig into the
sources too much), we can see that run list behaves just as a negated skip list.
So for any test *not* matching the run list, it sets both preliminary and final
result-kind to 'skip.

I am not sure that complies with the specification.  I see few relevant parts.
For the preliminary result we have:

> If we've started on a new test, but don't have a result yet, then the result
> kind is 'xfail if the test is expected to fail, 'skip if the test is supposed
> to be skipped, or #f otherwise.

Skipping is defined in description of `test-skip':

> Before each test (or test-group) the set of active skip-specifiers are applied
> to the active test-runner. If any specifier matches, then the test is skipped.

And for the test-apply:

> A test is executed if it matches any of the specifiers in the test-apply and
> does not match any active test-skip specifiers.

Since test-apply explicitly treats run list and skip list as two separate
concepts (instead of saying something like for example "If one or more
specifiers are listed, they are (in negated form) treated as part of currently
active skip specifiers."), I think the preliminary result is mandated to be #f.

It also says "test is executed", not "test is not skipped".  So it could be
argued that even the final result of 'skip is wrong (and should stay #f).  Now
that I think about it, that is probably the most reasonable reading of the
standard.  The test was not even considered for execution, so the result is not
available, hence #f.

What is your opinion here?

>
> On the other hand, if you think some other semantics would be
> more useful or clear, that would probably be OK too.

I think the semantics as I understand the standard are fine.  It still is bit
unclear how/if the run lists compose in case of nested test-apply calls (I
*think* per the specification they do not compose, but replace each other, so
only the inner-most run list is taken into account.  Is that correct reading?)

As far as the current reference implementation goes, I think it is bit
sub-optimal that there is no way to tell whether test was skipped due to run
list or skip list.  I can imagine test runner to prefer not to report test not
on run list at all.  If I, for example, have a test file with 100s of tests, and
I run (test-apply (test-match-name "t-foo" ...), I would expect only t-foo in
the output, not the 100s of other tests with 'skip result.  Or rather, I would
expect to at last be able to write such test runner, but in reference
implementation that is not possible as far as I can tell.  But in the end I can
just do by with `grep' I guess?

Thank you and have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.