lundi 22 janvier 2018

R: generating a discrete random probability distribution, by perturbating an existing one

If I wanted to efficiently generate a random discrete probability distribution of N probabilities which sum up to 1, I could go with Hadley's comment here:

prop.table(runif(N))

If I repeat this many times, the average probability for each of the N elements should be ~1/N.

What if I want the average probability for each of the N elements not to be 1/N but a specified number apriori?

E.g. N = 4 elements, I have the apriori distribution:

apriori <- c(0.2, 0.3, 0.1, 0.4)

And I would like random distributions based on this apriori, e.g.:

c(0.21, 0.29, 0.12, 0.38)
c(0.19, 0.29, 0.08, 0.44)
c(0.19, 0.33, 0.1, 0.38)

Etc.

Where we go by either of these rules:

1) On average each of the elements probabilities would be (approx.) its probability in the apriori distribution

2) There's a "perturbation" parameter, say perturbation = 0.05 which means either: (a) we're letting each of the probabilities i to be in the apriori[i] +- perturbation range or (b) we're letting each of the probabilities i to be in the apriori[i] +- perturbation * apriori[i] range (i.e. plus/minus 5% of that apriori probability, not absolute 5%)

I have no idea how to do this while keeping rule 1.

Regarding rule 2, my initial inefficient thought would be perturbating each of the first N - 1 elements by a random allowed amount, setting the last element to be 1 - sum(N-1_probs) and wrapping this with a while loop until the last element is also legitimate.

I didn't event implement it yet because that's very inefficient (say I want 100K of such distributions...). Ideas?




Aucun commentaire:

Enregistrer un commentaire