dimanche 25 octobre 2020

How to split up a dataset randomly to reflect new class proportions in R?

I am currently apply different classifiers (LDA and kNN) to the data set and exploring their different predictions. I want to see how these would differ if the class proportions of each species were not equal thirds. For example, if they changed to 0.1, 0.1 and 0.8 for setosa, virginica and versicolor. However, I am not sure how to approach this.

My first thought was to alter the Iris dataset to reflect these percentage changes before I perform any classification. However, I am having some difficulties trying to create this new dataset. In R, I tried using the dplyr package to try and split the species up randomly and then represent their new proportions but having no such luck. Any tips of how to do this ?!

I have tried this but had no such luck:

attach(iris)
new<-split(iris, Species)
setosa.new<-new$setosa
set.seed(123)
setosa.new[sample(nrow(setosa.new), 15), ]

However, ideally i would like the random sample to be the same each time. How can I ensure this?




Aucun commentaire:

Enregistrer un commentaire