I have a table called "transaction_history", containing millions of transactions with the following columns: column 1: customer_id column 2: transaction date
In this table one customer may have x amount of transactions, where X >= 1
What I am looking to do is get a random sample of n (n is the number of prizes to allocate to n winners)unique customer IDs BUT ensure that the more frequent the transactions for a given customer the higher their chances of being selected a winner.
I have tried the following: 1- the straight forward dplyr::sample_n(transaction_history, size = ...) which leads to sample with duplicate customer_ids
2- Transactions %>% dplyr::distinct(customer_id) %>% dplyr::sample_n(transaction_history, size = ...) which does not give frequent customers a higher chance
3- Sampling from per customer_id groups before sampling again which also defeats this goal.
Any help will be greatly appreciated.
Thanks
Aucun commentaire:
Enregistrer un commentaire