lundi 6 juin 2022

Sample different number of observations by level of a factor in R

In my dataset, I have a factor with many levels, and each level has a different number of observations (from 3 to 20). I want to sample randomly a different number of observations by level, let's say 7 for level A, 5 for level B, 8 for level C and so on. Of course, I cannot select more observations than each level has. I want to replicate the process n times and save the obtained subsets in separate dataframes. How can I do that? I have already searched on google without success and I have applied this code, but it only selects the same number of rows from each level (in this case 3):

x <- ddply(df, ~Area, function(x){ndf <- x[sample(nrow(x),3), ]})




Aucun commentaire:

Enregistrer un commentaire