jeudi 18 juillet 2019

How do I take samples from two different dataframes based on count of a common column?

I have 2 different data frames with same data structure 1. df1 with response 'Yes' (US states as Columns) 2. df2 with response 'No' (US states as Columns) I want to collect samples from both df and make 1 sample data frame of specified size. I want to keep the sample data-set balanced. For example, if I take sample from df1 and I get 50 obs from NY state then I want 50 random from df2.

I have made a function to take samples from df and shuffle them but unable to incorporate part 2

sample12<- function(df1,df2,size) {
  a<-df1[sample(nrow(df1),size/2,replace = T),] 
  b<-df2[sample(nrow(df2),size/2,replace = T),]
  s1<-bind_rows(a,b)
  s2<-s1[sample(1:nrow(s1)),]
  assign('s1',s2,.GlobalEnv)
}




Aucun commentaire:

Enregistrer un commentaire