On 5/5/20 11:09 AM, John Cowan wrote:
> On Tue, May 5, 2020 at 9:39 AM Lassi Kortela <xxxxxx@lassi.io
> <mailto:xxxxxx@lassi.io>> wrote:
>
> Should `make-random-boolean-generator` take an optional argument giving
> the probability with which #t is returned?
>
>
> I suppose we could, but gweighted-sampling is meant to solve that
> problem in a general way. For example, an unfair coin can be modeled
> like this:
>
> (gweighted-sampling 0.55 (circular-generator #t) 0.45
> (circular-generator #f))
>
> And of course it could equally well generate the symbols `heads` and
> `tails` instead.
A few points:
1. I don't understand either the description or the implementation of
the gweighted* routines.
2. Although I hate suggesting that other people do more work, I'd
recommend having both
(a) a Bernoulli distribution, returning 0 and 1:
https://en.wikipedia.org/wiki/Bernoulli_distribution
The values that this distribution returns can be used to index a vector
of any other values (including booleans) if you want to return other values.
(b) a Categorical distribution:
https://en.wikipedia.org/wiki/Categorical_distribution
As Lassi mentioned, this could be implemented using a Huffman coding
https://en.wikipedia.org/wiki/Huffman_coding
or
https://en.wikipedia.org/wiki/Categorical_distribution#Sampling
I don't recommend the method using binomial sampling, I know of no
really "good" way to sample a binomial distribution.