Each random variable generator should consume its own independent stream of random numbers Bradley Lucier 06 Apr 2023 21:28 UTC

I'm not recommending a change to this SRFI, these comments are directed
at authors of any followup SRFI (if any).  Unfortunately, there's a lot
of background at the beginning, and the point (stated in the subject
line) isn't explained until the end.

I'm going to distinguish between a stream of random numbers, that comes
from the primitive uniform generator, and streams of random variables
defined in this SRFI, which can have different distributions.

Some random number generators are based on either single or multiple
linear congruential generators (and in particular, so is the one in the
sample implementation of SRFI 27).  The parameters of these generators
are chosen to pass the so-called spectral test:

https://en.wikipedia.org/wiki/Spectral_test

L'Ecuyer claims that the SRFI 27 generator passes the spectral test in
over 40 dimensions, i.e., if you look at the distribution of

x_i in R^1,
(x_i, x_{i+1}) in R^2,
...
(x_i, x_{i+1},...,x_{i+39}) in R^{40}

then these points will be more or less uniformly distributed through the
multi-dimensional unit cube in all of these dimensions *simultaneously*.

This means that if you have a random variable generator that consumes a
single random number (e.g., a Bernoulli random generator), two random
numbers at a time (the Box-Mueller generator for Gaussian random
variables), or a variable but not terribly large sequence of random
numbers at a time (various rejection methods), then the generated random
variables should be independent and follow the given distribution fairly
well.

But this analysis holds only if you have a single random variable
generator consuming consecutive random numbers from a single stream.

If you have several random variable generators consuming random numbers
from a single stream, then you don't have guarantees on whether the
results will pass "randomness" tests.  (E.g., if you consume 2 random
numbers to generate a pair of Gaussian variables, and then 200 places
later in the random number stream you consume two more random numbers to
generate another pair of Gaussian variables, and you continue to do
this, the analysis can't "ensure" (for some value of "ensure") that the
Gaussian variables you generate will be independent and follow the
correct distribution.)

L'Ecuyer shows that the generator in SRFI 27 allows one to choose a lot
of different *substreams* of consecutive random numbers, and that each
substream is reasonably independent of other streams.

The routine make-random-source-generator on this SRFI allows one to step
through consecutive, independent streams of substreams of random
numbers.  The parameter i chooses a sequence of substreams in a
deterministic, reproducible, way.  For example, if one wanted to do a
simulation a thousand times, then one would set up

(define current-random-source-generator
   (make-random-source-generator i))

for i=0,...,999 to set up the experiment, then each time
current-random-source-generator was called, it would return the next
substream associated with the i'th sequence of substreams.

And now the point.  Once this is set up, each generator in this SRFI
should consume its own substream of random numbers.  For example,
ignoring error checking, make-random-integer-generator should be

(define (make-random-integer-generator low-bound up-bound)
   (let ((rand-int-proc
          (random-source-make-integers
(current-random-source-generator)))   ;;; << use and consume the next
substream of random numbers
         (range (- up-bound low-bound)))
     (lambda ()
       (+ low-bound (rand-int-proc range)))))

In other words, to get better assurance about independence and adherence
to a given distribution, different instances of random variable
generators should not be sampling from the same stream of random numbers.

I can simulate this behavior by saying, e.g.,

(define my-u1-generator
   (parametrize
    ((current-random-source (current-random-source-generator)))
    (make-random-u1-generator)))

every time I want to define a generator, but I claim that this behavior
should be the default and built in.

I don't know what to do with an implementation different from the SRFI
27 sample implementation, based on L'Ecuyer's papers.

Brad