random: How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

samedi 12 décembre 2020

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

I want to draw a random sample from my dataset, using different proportions for each value of a factor variable, as well as using weights stored in some other column. dplyr solution in pipes will be preferred as it can be inserted easily in long code.

Let's take the example of iris dataset. Species column is divided into three values 50 rows each. Let's also assume the sample weights are stored in column Sepal.Length. If I have to sample equal proportions (or equal rows) per species, the problem is easy to solve

library(tidyverse)

iris %>% group_by(Species) %>% slice_sample(prop = 0.1, weight_by = Sepal.Length)

# A tibble: 15 x 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     
 1          5.4         3.7          1.5         0.2 setosa    
 2          5.3         3.7          1.5         0.2 setosa    
 3          5.7         4.4          1.5         0.4 setosa    
 4          5           3.5          1.6         0.6 setosa    
 5          4.8         3.1          1.6         0.2 setosa    
 6          6.1         2.9          4.7         1.4 versicolor
 7          6.7         3.1          4.7         1.5 versicolor
 8          5           2            3.5         1   versicolor
 9          7           3.2          4.7         1.4 versicolor
10          5.7         2.9          4.2         1.3 versicolor
11          7.2         3.2          6           1.8 virginica 
12          6.7         2.5          5.8         1.8 virginica 
13          6.4         2.8          5.6         2.1 virginica 
14          6.3         3.3          6           2.5 virginica 
15          7.2         3            5.8         1.6 virginica

But I got stuck when I have to choose/sample different proportions for each species, say 10%, 20%, 25% respectively.

iris %>% group_by(Species) %>% slice_sample(prop = c(0.1, 0.2, 0.25), weight_by = Sepal.Length)

#Error: `prop` must be a single number

iris %>% group_split(Species) %>% map_df(c(0.1, 0.2, 0.25), ~ slice_sample(prop = ., weight_by = Sepal.Length))
# A tibble: 0 x 0

Please help

random

samedi 12 décembre 2020

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

Aucun commentaire:

Enregistrer un commentaire