mercredi 18 novembre 2020

R Sampling with proportions based on labels in the data

I am trying to sample my dataset with particular logic. I want to sample my data with certain portions for each labeled id. I wonder if there is this type of option in the sample() function in R

A simple description of my dataset is:

       id mode OD_ID
1:  50909    1     1
2:  62024    1     1
3:  82812    1     1
4: 100593    1     1
5: 150391    2     1
6: 159413    2     1
7: 132134    2     1
8: 111111    2     1
9:  78524    3     1
10:802212    3     1
   .
   .
   .

I would like to sample this data with certain ratio of column "mode" within the same id column "OD_ID"

For example i would like to sample data with columns OD_ID=1, with different ratio of "mode"

I would like my sampled dataset with mode=1 71% mode=2 21% and mode=3 8%. I have more data with sufficient number of rows and I want the sampled data set to have 10 data for each OD_ID. I would also want to round up the number of columns of the samples to the closest integer.

So an example of my output would be

      id mode OD_ID
  some id    1     1
  some id    1     1
  some id    1     1
  some id    1     1
  some id    1     1
  some id    1     1
  some id    1     1
  some id    2     1
  some id    2     1
  some id    1     1
   .
   .
   .
  some id    1     2
   .
   .
   .

with sampled data of 71% of mode 1 21% of mode2 8% of mode 3 for each pair of OD_ID

I would appreciate some help.




Aucun commentaire:

Enregistrer un commentaire