mercredi 22 mars 2017

Data sampling based on the value of a variable in R

It might be easy, but I'm not able to find a way to sample based on a variable in a database. Here is an example:

df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)

I want to random select a subset of colors, and create a new data.frame including all observation having the selected colors. A possible solution is the following (where I select two colors)

df[df$color %in% sample(df$color,2),]

but I'm looking for alternative methods (to be used in very large databases)




Aucun commentaire:

Enregistrer un commentaire