I tried to do stratified random sampling from a list with pre-defined elements of roughly the same size by taking 1 sample from each stratum. It seems to be working fine if the sampling pool is at least twice as big as the number of selected samples but something weird happens if this is not the case. Code (small pool):
library(dplyr)
a <- 1:10
n <- 10
div=length(a)/n
strata <- split(a, ceiling(seq_along(a)/div))
set.seed(52)
set <- sapply(strata, sample, 1)
set
Outcome:
1 2 3 4 5 6 7 8 9 10
1 2 3 3 5 5 6 7 3 1
The outcome should be 1,2,3,4,5,6,7,8,9,10. But instead 3 and 5 have been selected twice which should not happen. Also 3 and 1 are not in the strata of group '9' and '10' and so on.
If I change
a <- 1:100
Then the outcome:
1 2 3 4 5 6 7 8 9 10
2 13 23 38 45 57 63 71 85 99
And this is what I expect to get. One randomly selected sample from each stratum.
What is going on if the pool is too small compared to the number of desired samples? Why does it not take the remaining one number from each stratum?
Aucun commentaire:
Enregistrer un commentaire