dimanche 26 février 2023

Take random sample of rows from dataframe with grouping variables

I have a dataframe with the following structure:

dat <- tibble(
  item_type  = rep(1:36, each = 6), 
  condition1 = rep(c("a", "b", "c"), times = 72), 
  condition2 = rep(c("y", "z"), each = 3, times = 36), 
) %>% 
  unite(unique, item_type, condition1, condition2, sep = "-", remove = F)

which looks like this:

# A tibble: 216 × 4
   unique item_type condition1 condition2
   <chr>      <int> <chr>      <chr>     
 1 1-a-y          1 a          y         
 2 1-b-y          1 b          y         
 3 1-c-y          1 c          y         
 4 1-a-z          1 a          z         
 5 1-b-z          1 b          z         
 6 1-c-z          1 c          z         
 7 2-a-y          2 a          y         
 8 2-b-y          2 b          y         
 9 2-c-y          2 c          y         
10 2-a-z          2 a          z    

I would like to take a random sample of 36 rows. The sample should include 6 repetitions of the condition1 by condition2 combinations without repeating item_type.

Using slice_sample() it seems I can get the subset I want...

set.seed(1)
dat %>% 
  slice_sample(n = 6, by = c("condition1", "condition2")) %>% 
  count(condition1, condition2)
  condition1 condition2     n
  <chr>      <chr>      <int>
1 a          y              6
2 a          z              6
3 b          y              6
4 b          z              6
5 c          y              6
6 c          z              6

But on closer inspection I see that item_type is repeated.

set.seed(1)
dat %>% 
  slice_sample(n = 6, by = c("condition1", "condition2")) %>% 
  count(item_type) %>% 
  arrange(desc(n))
# A tibble: 22 × 2
   item_type     n
       <int> <int>
 1        10     3
 2        34     3
 3         1     2
 4         6     2
 5         7     2
 6        15     2
 7        20     2
 8        21     2
 9        23     2
10        25     2
# … with 12 more rows

In other words, I would like only unique pulls from item_type. Is it possible to get slice_sample() to do this?




Aucun commentaire:

Enregistrer un commentaire