Based on this reply: Random numbers that add to 100: Matlab I tried to apply the suggested method to randomly divide an integer into a fixed number of integers whose sum is equal to the integer. Although that method seems to result in a uniformly distributed set of points when the values are not integers, in the case of integers, the resulting tuples are not obtained with equal probability. This is shown by the following implementation in R, where a simple case is tested with 3 divisors and with the integer to be divided equal to 5:
# Randomly divide an integer into a defined number of integers
# Goal: obtain with equal probability any combination of variable values, with the condition that sum(variables) = dividend.
# install.packages(rgl) # Install rgl package if not yet installed. This allows to use the plot3d function to create a 3D scatterplot.
library(rgl)
n_draws = 10000
n_variables = 3 # Number of divisors. These need to be randomly calculated. Their value must be in the interval [0:dividend] and their sum must be equal to the dividend. Two variable can therefore have the same value.
dividend = 5 # Number that needs to be divided.
rand_variables = matrix(nrow = n_draws, ncol = n_variables) # This matrix contains the final values for each variable (one column per variable).
rand_samples = matrix(nrow = n_draws, ncol = n_variables-1) # This matrix contains the intermediate values that are used to randomly divide the dividend.
for (k in 1:n_draws){
rand_samples[k,] = sample(x = c(0:dividend), size = n_variables-1, replace = TRUE) # Randomly select (n_variables - 1) values within the range 0:dividend. The values in rand_samples are uniformly distributed.
midpoints = sort(rand_samples[k,])
rand_variables[k,] = sample(diff(c(0, midpoints, dividend)), n_variables) # Calculate the values of each variable such that their sum is equal to the dividend.
}
plot3d(rand_variables) # Create 3D scatterplot showing the values of rand_variables. This plot does not show how frequently each combination of values of the n_variables is obtained, only which combinations of values are possible.
table(data.frame(rand_variables)) # This prints out the count of each combination of values of n_variables. It shows that the combinations of values in the corners (e.g. (5,0,0)) are obtained less frequently than other combinations (e.g. (1,2,2)).
The last line gives the following output, which shows how many times were obtained each combination of values of (X1, X2, X3) that respect the condition X1 + X2 + X3 = 5:
, , X3 = 0
X2
X1 0 1 2 3 4 5
0 0 0 0 0 0 397
1 0 0 0 0 471 0
2 0 0 0 469 0 0
3 0 0 446 0 0 0
4 0 456 0 0 0 0
5 358 0 0 0 0 0
, , X3 = 1
X2
X1 0 1 2 3 4 5
0 0 0 0 0 450 0
1 0 0 0 539 0 0
2 0 0 560 0 0 0
3 0 588 0 0 0 0
4 426 0 0 0 0 0
5 0 0 0 0 0 0
, , X3 = 2
X2
X1 0 1 2 3 4 5
0 0 0 0 428 0 0
1 0 0 603 0 0 0
2 0 549 0 0 0 0
3 461 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
, , X3 = 3
X2
X1 0 1 2 3 4 5
0 0 0 500 0 0 0
1 0 549 0 0 0 0
2 455 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
, , X3 = 4
X2
X1 0 1 2 3 4 5
0 0 465 0 0 0 0
1 458 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
, , X3 = 5
X2
X1 0 1 2 3 4 5
0 372 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
As the output shows, the combinations of values in the corners of the plane (e.g. (5,0,0)) are obtained less frequently than other tuples.
How can I obtain any integer tuple with the same probability?
I'm looking for a solution that is applicable for any positive integer and for any number of divisors.
Aucun commentaire:
Enregistrer un commentaire