It might be easy, but I'm not able to find a way to sample based on a variable in a database. Here is an example:
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
I want to random select a subset of colors, and create a new data.frame including all observation having the selected colors. A possible solution is the following (where I select two colors)
df[df$color %in% sample(df$color,2),]
but I'm looking for alternative methods (to be used in very large databases)
Aucun commentaire:
Enregistrer un commentaire