Suppose that X takes value i with probability p(i), where i = 0, ..., k - 1, for some k. The probabilities are rational numbers a(i)/b where it is safe to assume b < 1 000 000 (but maybe not b < 100 000) and k <= 10.
What would be the most efficient algorithm for sampling? The number of samples is around b, i.e., high.
I can come up with two not completely satisfying approaches:
- Construct the list c = [p(0), p(0) + p(1), ... , p(0) + ... + p(k - 1), 1.5]
- Generate uniformly distributed random number p and return i, such that c[i] <= p < c[i + 1].
The second step is logarithmic in k (bisection). Can we do better? Sure:
- Construct the list c = [0, ... , 0, 1, ... , 1, ...... , k - 1, ... , k - 1], where the number of occurrences of i is proportional to p(i).
- Generate uniformly distributed random number p and return c[floor(length(c) * p)].
The second step now takes constant time, but computation may take too much space.
(This will be implemented in Python3, if this information matters.)
Aucun commentaire:
Enregistrer un commentaire