mardi 26 octobre 2021

Setting a condition for `sample()` in `group_by()`

Below, I'm trying to randomly select the rows of a group value in each study in my data.

First, we group_by(study), then, randomly pick one of the group's rows in each study (see below).

But there is little issue. group elements have an order. So, after sample()ing them, we can't end up with any study that only has: group = 2, or group = 3, or group = 2,3. We should only end up either with: group = 1, or group= 1,2 or group = 1,2,3.

I think the solution is that whenever sample() picks a group value that is larger than 1 (e.g., 2), then rename that to 1.

I wonder how such a condition/modification can be added to my current code below?

library(tidyverse)

(data <- expand_grid(study=1:3,group=1:3,outcome=c("A","B"), time=0:1) %>%
    as.data.frame())

return_rows <- function(x) {
  u <- unique(x)
  n <- sample(c(min(u)-1, u), 1) #If n = n[1] select all group values 
  if(n == n[1]) TRUE else x == n  # Else, select row for corresponding group
}

set.seed(0) ### For reproducibility 
data %>%
  group_by(study) %>%
  filter(return_rows(group)) %>%
  ungroup() %>% as.data.frame()



Aucun commentaire:

Enregistrer un commentaire