random: Split dataset into half, randomly selecting half of each level of chosen variable

vendredi 26 août 2016

Split dataset into half, randomly selecting half of each level of chosen variable

I know how to split the dataset in half completely randomly no problem, but while I understand "logically" how to go about what I want to do, the twist here seems to be throwing me off.

So I have a dataset with a categorical variable Title which has 120 levels. Each level has 50 observations. I'd like to split the dataset in half in such a way where each of halfset A and halfset B get a random 25 of the 50 observations for each level of Title. (this is for EFA and CFA)

I think it would involve a for loop to loop through the 120 levels and sample(nrow(subset(dataset,title=index), 25), but I'm a little lost beyond that. What little potential solutions I've thought of does the selecting a random 25 for halfset A but with replacement, so when I run it again to make halfset B, it has some overlap.

Thanks as always, everyone.

random

vendredi 26 août 2016

Split dataset into half, randomly selecting half of each level of chosen variable

Aucun commentaire:

Enregistrer un commentaire