If I wanted to efficiently generate a random discrete probability distribution of N probabilities which sum up to 1, I could go with Hadley's comment here:
prop.table(runif(N))
If I repeat this many times, the average probability for each of the N elements should be ~1/N.
What if I want the average probability for each of the N elements not to be 1/N but a specified number apriori?
E.g. N = 4
elements, I have the apriori
distribution:
apriori <- c(0.2, 0.3, 0.1, 0.4)
And I would like random distributions based on this apriori, e.g.:
c(0.21, 0.29, 0.12, 0.38)
c(0.19, 0.29, 0.08, 0.44)
c(0.19, 0.33, 0.1, 0.38)
Etc.
Where we go by either of these rules:
1) On average each of the elements probabilities would be (approx.) its probability in the apriori distribution
2) There's a "perturbation" parameter, say perturbation = 0.05
which means either: (a) we're letting each of the probabilities i
to be in the apriori[i] +- perturbation
range or (b) we're letting each of the probabilities i
to be in the apriori[i] +- perturbation * apriori[i]
range (i.e. plus/minus 5% of that apriori probability, not absolute 5%)
I have no idea how to do this while keeping rule 1.
Regarding rule 2, my initial inefficient thought would be perturbating each of the first N - 1 elements by a random allowed amount, setting the last element to be 1 - sum(N-1_probs)
and wrapping this with a while loop until the last element is also legitimate.
I didn't event implement it yet because that's very inefficient (say I want 100K of such distributions...). Ideas?
Aucun commentaire:
Enregistrer un commentaire