jeudi 11 juin 2020

Random selection by 2 layer strata in r

I have a large data set that I want to modify to look 'similar' to another dataset in proportions.

So target data set has proportions for variable X like this

'A' = 0.5,
'B'= 0.2,
'C'= 0.1
'D'= 0.2

And I want a group variable to be 2:1 ratio so that the data is for every trt there are 2 ctrl

My data looks like this:

 ID          GRP         X         Y
 1           ctrl         A        2
 2           ctrl         A        2
 3           ctrl         B        1
 4           trt          A        4

etc

I can make it into equal groups of X and GRP with this code:

DF%>% group_by(X, GRP) %>%sample_n(2500) 

But I would like to get a 2:1 ratio for GRP and preserve that initial ratio of X. Is there a way to specify the percentage of the total group by strata in random sampling?




Aucun commentaire:

Enregistrer un commentaire