I've looked everywhere but can't find someone doing this. I imagine there must be a way in R though.
I have a dataset of around 200k rows that looks like this:
Report ID | Month | Day | Year | Location ID | comments
1 4 1 2015 200 blah blah blah
2 11 3 2014 100 blah blah blah
3 4 5 2015 203 blah blah blah
4 8 30 2012 204 blah blah blah
5 11 5 2013 204 blah blah blah
6 11 1 2015 100 blah blah blah
7 11 10 2013 204 blah blah blah
I need to create a random sample of report IDs that has an even distribution of location IDs, year, and months. I know this wouldn't truly be a random sample, but location ID skews pretty heavily to some locations and some months have way more reports than others.
I have tried various sampling and sub setting techniques in R, but they all seem to want to sample the data set as a whole and I've been unable to locate a way where I can ask the sample to provide say 500 report ids for each location. Let alone be able to then say, within this 500, I want an even distribution of years and months. Any suggestions?
Aucun commentaire:
Enregistrer un commentaire