mardi 30 juin 2020

Random sample of N unique customer IDs while ensuring the more one transacts the higher chances of being selected

I have a table called "transaction_history", containing millions of transactions with the following columns: column 1: customer_id column 2: transaction date

In this table one customer may have x amount of transactions, where X >= 1

What I am looking to do is get a random sample of n (n is the number of prizes to allocate to n winners)unique customer IDs BUT ensure that the more frequent the transactions for a given customer the higher their chances of being selected a winner.

I have tried the following: 1- the straight forward dplyr::sample_n(transaction_history, size = ...) which leads to sample with duplicate customer_ids

2- Transactions %>% dplyr::distinct(customer_id) %>% dplyr::sample_n(transaction_history, size = ...) which does not give frequent customers a higher chance

3- Sampling from per customer_id groups before sampling again which also defeats this goal.

Any help will be greatly appreciated.

Thanks




Aucun commentaire:

Enregistrer un commentaire