mercredi 16 novembre 2016

How do I sample single (random) rows that can be grouped by a column's values?

Here is a sample of the data

p <- structure(list(name = structure(1:5, .Label = c("Alice", "Bob",
"Charlie", "Dennis", "Earl"), class = "factor"), cohort = structure(c(3L,
3L, 2L, 2L, 1L), .Label = c("X", "Y", "Z"), class = "factor"),
    group = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("A",
    "B"), class = "factor"), var = c(1L, 2L, 1L, 3L, 4L)), .Names = c("name",
"cohort", "group", "var"), class = "data.frame", row.names = c(NA,
-5L))

that looks like

     name cohort group var
1   Alice      Z     A   1
2     Bob      Z     A   2
3 Charlie      Y     B   1
4  Dennis      Y     B   3
5    Earl      X     A   4

and I need something like the following, based on the cohort column. I need to sample one row in each cohort (possibly randomly) so that I don't have multiple people belonging to the same cohort.

     name cohort group var
2     Bob      Z     A   2
3 Charlie      Y     B   1
5    Earl      X     A   4

I can group_by cohort, but then I'm not sure how to proceed to create a new data frame with only the rows that I need.




Aucun commentaire:

Enregistrer un commentaire