I want to use R to write a model that will answer a general question about probability. The general question is below, followed by my specific questions about how to answer it using R code. If you know the answer to the general question (separate from the R code), and can explain the underlying statistical principles in plain English, I'm interested in that too!
Question: If I split a group of n objects, first through a 4-way splitter, then through a 7-way splitter (resulting in a total of 28 distinct groups), and each splitter results in a random distribution (i.e. the objects are split approximately equally), does the order of the splits impact the variance of the final 28 groups. If I split into 4 and then into 7, is that different than splitting into 7 and then into 4? Does the answer change if one splitter has greater variance than the other?
Specific R question: how can I write a model to answer this question? So far, I've tried using sample
and rnorm
to generate sample data. Simulating a 4-way splitter would look something like this:
sample(1:4, size=100000, replace=TRUE)
This is basically like rolling a 4-sided die 100,000 times and recording the number of instances of each number. I can use the table
function to sum the instances, which gives me an output like this:
> table(sample(1:4, size=100000, replace=TRUE))
1 2 3 4
25222 24790 25047 24941
Now, I want to take each of those outputs and use them as the input for a 7-way split. I tried saving the 4-way split as a variable and then plugging that vector in the the size =
variable like this:
Split4way <- as.vector(table(sample(1:4, size=100000, replace=TRUE)))
as.vector(table(sample(1:7, size=Split4Way, replace=TRUE)))
But when I do that, instead of a matrix with 4 rows and 7 columns, I just get a vector with 1 row and 7 columns. It appears that "size" variable for the 7-way split only uses 1 of the 4 outputs from the 4-way split instead of using each of them.
> as.vector(table(sample(1:7, size = Split4up, replace=TRUE)))
[1] 3527 3570 3527 3511 3550 3480 3588
So, how can I generate a table or list that shows all the outputs of the 4-way split followed by the 7-way split, for a total of 28 splits?
AND
Is there a function that will allow me to customize the standard deviation of each splitting device? For example, can I dictate that the outputs of the 4-way splitter have a standard deviation of x%, and the outputs of the 7-way splitter have a standard deviation of x%?
Aucun commentaire:
Enregistrer un commentaire