mercredi 8 avril 2020

How does bias manifest in bounded random number generation

I am trying to digest the following post https://www.pcg-random.org/posts/bounded-rands.html on non biased, efficient random number generation.

Here is an excerpt describing the classical, modulo approach.

uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    return rng() % range;
}

But in addition to being slow, it is also biased. To understand why rand() % 52 produces biased numbers, if we assume that rand() produces numbers in the range [0..2^32), observe that 52 does not perfectly divide 2^32, it divides it 82,595,524 times with remainder 48. Meaning that if we use rand() % 52, there will be 82,595,525 ways to select the first 48 cards from our 52-card deck and only 82,595,524 ways to select the final four cards . In other words, there is a 0.00000121% bias against these last four cards...

The post goes on to show another technique that uses floating-point arithmetic to essentially generate a random fraction of the desired range and truncate it to an integer.

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    double zeroone = 0x1.0p-32 * rng();
    return range * zeroone;
}

This approach is just as biased as the classic modulo approach, but the bias manifests itself differently. For example, if we were choosing numbers in the range [0..52), the numbers 0, 13, 26 and 39 would appear once less often than the others.

The last paragraph is what has me confused. I am not well versed in floating-point arithmetic, so I am struggling to make the connection between the bias in the modulo method and the bias in the floating-point method. All I see is that in both techniques, 4 numbers are biased against.




Aucun commentaire:

Enregistrer un commentaire