jeudi 20 octobre 2022

Rnadom sampling one row within each id using R

I made a data like this:

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,4,4,4),
                 yearmonthweek=c(2012052,2012053,2012061,2012062,2013031,2013052,2013053,2012052,
                                 2012053,2012054,2012071,2012073,2012074),
                 event=c(0,1,1,0,0,1,0,0,0,0,0,0,0),
                 a=c(11,12,13,10,11,12,15,14,13,15,19,10,20))

id stands for personal id. yearmonthweek means year, month and week. I want to clean data by the following rules. First, find id that have at least one event. In this case id=1 and 2 have events and id=3 and 4 have no events. Secondly, pick a random row from an id that has events and pick a random row from an id that has no events. So, the number of rows should be same as the number of id. My expected output looks like this:

data<-data.frame(id=c(1,2,3,4),
                 yearmonthweek=c(2012053,2013052,2012052,2012073),
                 event=c(1,1,0,0),
                 a=c(12,12,14,10))

Since I use random sampling, the values can be different as above, but there should be 4 rows like this.




Aucun commentaire:

Enregistrer un commentaire