I am trying to randomly sample n times a given grouped variable, but the n varies by the group. For example:
library(dplyr)
iris <- iris %>% mutate(len_bin=cut(Sepal.Length,seq(0,8,by=1))
I have these factors, which are my grouped variable:
table(iris$len_bin)
(4,5] (5,6] (6,7] (7,8]
32 57 49 12
Is there a way to randomly sample only these groups n times, n being the number of times each element is present in this vector:
x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")
The result should look like:
# Groups: len_bin [4]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species len_bin
<dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 5 2 3.5 1 versicolor (4,5]
2 5.3 3.7 1.5 0.2 setosa (5,6]
2 5.3 3.7 1.5 0.2 setosa (5,6]
2 5.3 3.7 1.5 0.2 setosa (5,6]
3 6.5 3 5.8 2.2 virginica (6,7]
I managed to do this with a for loop and using sample_n() based on the vector. I am assuming there must be a faster way. Can I define n within sample_n() for example?
Aucun commentaire:
Enregistrer un commentaire