jeudi 22 mars 2018

Dplyr - random sample by group and filtering on the basis of result

I have a dataframe that rougly looks like this:

+----+-------+----------+-------+
| Id | Month | Cal Week | Value |
+----+-------+----------+-------+
|  1 |     5 |   201708 |   4.5 |
|  1 |     5 |   201709 |     5 |
| 1  |     5 |   201710 |     6 |
|  2 |    88 |   201741 |    75 |
|  2 |    88 | 201742   |    89 |
| 2  |    88 | 201743   |    90 |
|  2 |    88 |   201744 |    51 |
+----+-------+----------+-------+

I would like to randomly sample a calendar week from each id-month group (the months are not calendar months). Then I would like to keep all id-month combination prior to the sample months.

I tried to work with dplyr's sample_n function (which is going to give me the random calendar week by id-month group, but then I do not know how to get all calendar weeks prior to that date. Can you help me with this. If possible, I would like to work with dplyr.

Please let me know in case you need further information.

Many thanks




Aucun commentaire:

Enregistrer un commentaire