jeudi 5 mai 2022

Random subsample with specified levels for each group in R

I have 2 datasets that I'm trying to match. I have a population and I have samples brought from that population. However, the samples were captured opportunistically and do not represent the population. So I'm trying to subsample my samples to match the proportion in each length to the population I have. However, I'm struggling to find a way to do this in R. It is compounded by the fact that each group may not have enough to fulfill the size bin I listed. R will throw an error rather than skip it and I'm unsure how to fix this problem. Here is the code I have been playing with:

Subsample<-  lapply(seq_along(NumberinSizeBin), function(x)    sample_n(Samples[[x]], NumberinSizeBin[x], replace=FALSE))    
Subsample<- do.call("rbind", Subsample)

Here are some simple data frames that mimic my sets:

NumberinSizeBin<-data.frame(Length= c("220", "240", "260", "280", "300", "320"), Count= c(2, 10, 4, 3, 2, 1))
Samples<-data.frame(ID= c(18:48), Length=c("220","220","220","220","220","220", "220","220","220","240","240","240","240","240","240","240","260","260","260","260","260","260","280","280","280","280","280","300","320","320","320"), Sex=c("M","F","F","M","M","F","F","M","M","F","M","F","F","M","M","F","F","M","M","F","M","F","F","M","M","F","F","M","M","F","M"))



Aucun commentaire:

Enregistrer un commentaire