mardi 15 novembre 2022

How to sample data in R, within specific amount of cases

I'm trying to resample data, but I don't know how to tell the "sample" function to do it within a condition or specific amount of cases.

The objective is to resample a statistic from a Wald test, balancing the amount of cases of two conditions: control vs the treatment, as the original data is unbalanced.

For this I have 48 "control_IDs" (corresponding to 228 observations across time), and 326 "treatments_IDs", corresponding to 4531 observations. The ID corresponds to the individual subject to the treatment or not.

Overall, one part of the structure is pretty straightforward, and I'm able with the attached code to sample 228 from the 4531 observations of my "treatment", to run the model and Wald test.

The problem that I have is that when I run the code, the 48 IDs of the "control" are fix, but the ones from the "treatment" are not. So I end with 228 observations of 48 control_IDs, but 228 observations of a varying amount of "treatment_IDs" (from ~60 to ~180) depending on the run.

Any idea of how to properly code that the random 228 treatment observations comes from a random sample of 48 "treatment_IDs"?

Thanks!

df2<-subset(original_df,treatment =="control")

resampling<-replicate(5000,{resampled.data=sample(1:nrow(df), 0.0504*nrow(df), replace=F)
x1=df[resampled.data,]
x2=rbind(x1,df2)
Model= glmer(response ~ treatment*var1*var2+var3
            + (1|ID), data=x2, family="binomial",
            control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
Anova(Model)[1,-1:-2]
})



Aucun commentaire:

Enregistrer un commentaire