mercredi 20 novembre 2019

Randomly select observations within a group using dplyr/base function R

I have data such as this:

student.id<- c("142", "142", "567","567","347","347", "567", "945")
flag.double <- c("1","1","1", "1","1", "1","0", "0")
data <- data.frame(student.id,flag.double)

I want group by student.id, filter down to students that have a flag.double==1 and randomly select one student observation (from the two available occurrences of students). Then I want to merge this back into data.

This gives me the random observations I want:

>  flag<-data %>% group_by(student.id) %>% filter(flag.double==1) %>% sample_n(flag.double, replace = T)

But now I need to merge this back into the original data frame so that I can I have a column flagging these observations that were chosen randomly.

Next, i just left joined this back into the data set.

data<- left_join(data, flag)

Everything WORKS well, but I hate how inefficient this all looks. I also don't like creating a new data frame just to join it back into the original one. Is there a more elegant dplyr, tidyr, plyr, pipe incorporated way of doing this??




Aucun commentaire:

Enregistrer un commentaire