jeudi 9 juin 2016

Generate random numbers by group without replacement

I have a large-ish (>500k rows) dataset with 421 groups, defined by two grouping variables. Sample data as follows:

df<-data.frame(group_one=rep((0:9),26), group_two=rep((letters),10))

head(df)

group_one group_two
1         0         a
2         1         b
3         2         c
4         3         d
5         4         e
6         5         f

...and so on.

What I want is some number (k = 12 at the moment, but that number may vary) of stratified samples, by membership in (group_one x group_two). Membership in each group should be indicated by a new column, sample_membership, which has a value of 1 through k (again, 12 at the moment). I should be able to subset by sample_membership and get up to 12 distinct samples, each of which is representative when considering group_one and group_two.

Final data set would thus look something like this:

       group_one group_two sample_membership
1      0         a         1  
2      0         a         12
3      0         a         5
4      1         a         5
5      1         a         7
6      1         a         9

Thoughts? Thanks very much in advance!




Aucun commentaire:

Enregistrer un commentaire