mercredi 24 juin 2020

Stratified random sampling from data frame_follow up

I am trying to randomly sample 50% of the data for each of the group following Stratified random sampling from data frame. A reproducible example using mtcars dataset in R looks like below. What I dont understand is, the sample index clearly shows a group of gear labeled as '5', but when the index is applied to the mtcars dataset, the sampled data mtcars2 does not contain any record from gear='5'. What went wrong? Thank you very much.

> set.seed(14908141)
> index=tapply(1:nrow(mtcars),mtcars$gear,function(x){sample(length(x),length(x)*0.5)})
> index
$`3`
[1]  6  7 14  4 12  9 13

$`4`
[1] 12  7  8  4  6  5

$`5`
[1] 5 1

> mtcars2=mtcars[unlist(index),]
> table(mtcars2$gear)

 3  4 
12  3 



Aucun commentaire:

Enregistrer un commentaire