vendredi 26 août 2016

Split dataset into half, randomly selecting half of each level of chosen variable

I know how to split the dataset in half completely randomly no problem, but while I understand "logically" how to go about what I want to do, the twist here seems to be throwing me off.

So I have a dataset with a categorical variable Title which has 120 levels. Each level has 50 observations. I'd like to split the dataset in half in such a way where each of halfset A and halfset B get a random 25 of the 50 observations for each level of Title. (this is for EFA and CFA)

I think it would involve a for loop to loop through the 120 levels and sample(nrow(subset(dataset,title=index), 25), but I'm a little lost beyond that. What little potential solutions I've thought of does the selecting a random 25 for halfset A but with replacement, so when I run it again to make halfset B, it has some overlap.

Thanks as always, everyone.




Aucun commentaire:

Enregistrer un commentaire