mercredi 1 avril 2020

How to randomly sample based on two variables?

I have a dataset where each person (df$FullName) is associated with a Study ID (df$StudyID) but each Study may have 3 to 5 images, causing the Study ID number to repeat a few times per person.

Example:

StudyID   FullName
1029      Person A
1029      Person A
1029      Person A
1039      Person B
1039      Person B
1039      Person B
1039      Person B
1062      Person A
1062      Person A
1062      Person A

How can I randomly select 20 Studies per person while also selecting all the images per study? I do not want to sample each study once (as my code below has done).

set.seed(1)
randomselectCC <- StudySample_cc %>%
  group_by(StudyID) %>%
  sample_n(1, replace=FALSE)

The outcome I am looking for is for each person to have the same number of studies (20) but potentially varying number of images. Thanks so much!




Aucun commentaire:

Enregistrer un commentaire