I have a data set from which I want to take a random sample by group up to 30 rows. However, I also want to make sure that at least 1 row for another grouping is included. Additionally, some groups have less than 30 rows, in which case all of the rows for that group should be included. I can't include the exact data set I'm working with because it's proprietary; however, an example for a data frame df would be:
ID|Age|State|Gender|Salary
1 25 CO M 50000
2 34 CO M 72000
3 28 CO M 52000
4 25 CO F 44000
5 25 CA F 55000
6 34 CA F 100000
7 39 CA M 88000
8 34 CA M 59000
... up to 15000 rows
So, I want a random sample of the data set so that no more than 30 rows are given from each state. Then, for each state, I want at least 1 row for each age and gender that exists in the data set. If there are less than 30 age/gender combinations for a given state, but there are more than 30 rows for that state, then the sample should include multiple rows for a given age/gender so that 30 rows are given for that state. If there are less than 30 rows for that state, then I want all the rows in the data set for that state. If there are more than 30 age/gender combinations for a given state, then the sample should have 1 of each up to 30.
Is there a way for me to do this in R?
Aucun commentaire:
Enregistrer un commentaire