samedi 1 septembre 2018

Efficient sampling of descrete random variable

Suppose that X takes value i with probability p(i), where i = 0, ..., k - 1, for some k. The probabilities are rational numbers a(i)/b where it is safe to assume b < 1 000 000 (but maybe not b < 100 000) and k <= 10.

What would be the most efficient algorithm for sampling? The number of samples is around b, i.e., high.

I can come up with two not completely satisfying approaches:

  1. Construct the list c = [p(0), p(0) + p(1), ... , p(0) + ... + p(k - 1), 1.5]
  2. Generate uniformly distributed random number p and return i, such that c[i] <= p < c[i + 1].

The second step is logarithmic in k (bisection). Can we do better? Sure:

  1. Construct the list c = [0, ... , 0, 1, ... , 1, ...... , k - 1, ... , k - 1], where the number of occurrences of i is proportional to p(i).
  2. Generate uniformly distributed random number p and return c[floor(length(c) * p)].

The second step now takes constant time, but computation may take too much space.

(This will be implemented in Python3, if this information matters.)




Aucun commentaire:

Enregistrer un commentaire