vendredi 23 juillet 2021

Randomly sleeting rows based on all groups in two columns

I have a large dataset with about 167k rows. I would like to take a sample of 2000 rows of it while making sure I am taking rows from all groups in two columns (id & quality) in the data. This is a snapshot of the data

df <- data.frame(id=c(1,2,3,4,5,1,2),
                 quality=c("a","b","c","d","z","g","t"))

df %>% glimpse()
Rows: 7
Columns: 2
$ id      <dbl> 1, 2, 3, 4, 5, 1, 2
$ quality <chr> "a", "b", "c", "d", "z", "g", "t"

So, I need to ensure that the sampled data has rows from all combinations of these two group columns. I hope someone can help out.

Thanks!




Aucun commentaire:

Enregistrer un commentaire