The correct answer is that a test runner should not report success or failure in a binary manner but should report the probability (or a conservative approximation of it) that the test succeeded.
Such a feature should be added to test runners like the one specified in SRFI 64. Several runs of a probabilistic test should be independent, and the test runner should be able to run probabilistic tests repeatedly to achieve a minimal probabilistic threshold (which could be specified by the environment).