I am hoping I can solve this in T-SQL if appropriate. Any help is appreciated, I have seen other similar questions, but the fact I have multiple entries for the same person make it difficult.
I have a large dataset with IDs (unique for each person) and dates from 2015-2020. This is prescription data for individuals (ID) and their fill dates, so there are typically multiple rows for each ID - both within a year and across multiple years.
I want to randomly pick one date per ID/person that follows the following proportion/probability: 5%-2015, 10%-2016, 10%-2017, 15%-2018, 20%-2019, and 40% 2020. There are 1.2 million unique IDs/people, and about 300,000 people have a fill in 2020 which seems like a limiting factor.
Aucun commentaire:
Enregistrer un commentaire