I want to try a random forest on this data where y = happy after x = ate. Some of these people were lucky and got two free meals, while some only got one. Could I use rsample to make sure that the same id (in this case 5) does not appear in both the train and test split? If not, how should I do it?
library(tibble)
library(rsample)
set.seed(123)
dframe <- tibble(id = c(1,1,2,2,3,4,5,5,6,7),
ate = sample(c("cookie", "slug"), size = 10, replace = TRUE),
happy = sample(c("yes", "no"), size = 10, replace = TRUE))
dframe_split <- initial_split(dframe, strata = "ate")
dframe_train <- training(dframe_split)
dframe_test <- testing(dframe_split)
Created on 2018-10-11 by the reprex package (v0.2.0).
Aucun commentaire:
Enregistrer un commentaire