The correct answer is that a test runner should not report success or failure in a binary manner but should report the probability (or a conservative approximation of it) that the test succeeded.

Such a feature should be added to test runners like the one specified in SRFI 64.  Several runs of a probabilistic test should be independent, and the test runner should be able to run probabilistic tests repeatedly to achieve a minimal probabilistic threshold (which could be specified by the environment).

Everything else is not true to true-to-fact.
Marc

Am Mi., 7. Feb. 2024 um 06:51 Uhr schrieb Arthur A. Gleckler <xxxxxx@speechcode.com>:
On Tue, Feb 6, 2024 at 9:46 PM Linas Vepstas <xxxxxx@gmail.com> wrote:
Given that these are random number driven tests, they will fail 1 out of N times, and the question is what the tolerated failure rate should be. Apparently, 1 out of 10 is not a happy place. But perhaps 1 out of 1000? 1 out of a million?  Failure rates of 1 in a billion push the tests into a corner where they aren't really testing anything meaningful any more. 

Perhaps the test should consist of several independent subtest runs, each with a low probability of failing.  If truly random numbers are used, could we reduce the probability that every one of the subtests fails to a vanishingly small number?