samedi 23 février 2019

Easy way to create a matched control set

I would like to take samples from a "control" dataset, matched by a variable called "length" (+/- 100) from a "case" dataset. I am trying to create many samples of this, and then test whether a variable "var" we observe in the "case" dataset appears more than expected.

I have two questions: 1. Is there an easier way to find the matched control set, better than going through two for loops? 2. Is there an R package to help with both setting up a matched control and even getting a pvalue for this type of analysis?

Right now this is what I am doing:

cases = data.frame(id = 1:10, length = sample(1:1000,10), var = sample(c(TRUE,FALSE), 10, TRUE)) 
control = data.frame(id = 1:100, length = sample(1:1000,100), var = sample(c(TRUE,FALSE), 100, TRUE)) 

res = data.frame()
nperm = 10
for (perm in 1:nperm) {
    control_random = control[sample(nrow(control),nrow(control),replace=FALSE),] # draw the control into a random order
    for (i in 1:nrow(cases)) { # loop through cases to find a match for each
    for (j in 1:nrow(control)) { # for each case, loop through control looking for a match
        if (abs(control_random$length[j] - cases$length[i]) < 100) { # match if length is within 100
            break
        }
    }

 res = rbind.data.frame(res, data.frame(perm, cases$id[i], control_random$id[j], control_random$var[j]))
 # and remove it so we don't use it again
 control_random = control_random[-j,]
 }
 }

# pvalue:
# count how many times var is TRUE in control set compared to cases
# ncases = length(cases$id[cases$var])

Thank you for your help!




Aucun commentaire:

Enregistrer un commentaire