jeudi 12 mai 2022

Sample from groups, but n varies per group in R

I am trying to randomly sample n times a given grouped variable, but the n varies by the group. For example:

iris <- iris %>% mutate(len_bin=cut(Sepal.Length,seq(0,8,by=1))

I have these factors, which are my grouped variable:


(4,5] (5,6] (6,7] (7,8] 
   32    57    49    12 

Is there a way to randomly sample only these groups n times, n being the number of times each element is present in this vector:

x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")

The result should look like:

# Groups:   len_bin [4]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species    len_bin
         <dbl>       <dbl>        <dbl>       <dbl> <fct>      <fct>  
1          5           2            3.5         1   versicolor (4,5]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
3          6.5         3            5.8         2.2 virginica  (6,7]  

I managed to do this with a for loop and using sample_n() based on the vector. I am assuming there must be a faster way. Can I define n within sample_n() for example?

