lundi 25 novembre 2019

How to block randomize data according more than just 1 parameter using R

I want to do block randomize my data into 3 arms with respect to both gender and smoking status as best as possible.

Here is some simulated data similar to my actual data. Note that males & females and smokers & non-smokers are unevenly sampled.

set.seed(33)
mydata <- data.frame("gender"=rep(c("female", "male"),  times=c(40,10)),
                 "smoker"=rep(c("yes", "no"), each=50),
                 "measurement"=rnorm(n=50, mean=15, sd=3),
                 "outcome of interest"= rep(c("positive", "negative"), times=c(20,30)))
head(mydata)
#     gender smoker measurement outcome.of.interest
# 1   female    yes   12.309256            positive
# 2   female    yes   15.554548            positive
# 3   female    yes   19.763536            positive
# 4   female    yes   11.608873            positive
# 5   female    yes   14.759245            positive
# 6   female    yes    15.39726            positive

I found the randomizr package useful for randomizing according to 1 variable, but I get unbalanced distribution of the other:

set.seed(2)
library(randomizr)
Z <- block_ra(blocks = mydata[,"gender"], num_arms = 3)
table(Z, mydata$gender)
# Z    female male
#   T1     26    7
#   T2     27    6
#   T3     27    7
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  16
#   T2 13  20
#   T3 20  14

Z <- block_ra(blocks = mydata[,"smoker"], num_arms = 3)
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  17
#   T2 17  16
#   T3 16  17
table(Z, mydata$gender)
# Z    female male
#   T1     29    5
#   T2     24    9
#   T3     27    6

How can I block randomize according to 2 or more parameters?




Aucun commentaire:

Enregistrer un commentaire