samedi 5 août 2023

how do i assign randomly assign accounts into 3 groups that is similar by way of average spend, total spend, and frequency of spend

I have a some sample sales transactions by accounts, and would like to do a test and learn on 3 groups. I'd like the assign 3 groups to all the unique account ids and have them to be as similar as possible so the trial can be without selection bias. The bias could be in by frequency of transaction, average spend, and total spend.

I was considering using a quantile function i.e. mutate ntile for 5 bins based on sale_amount then subsequently group by and sample_n.

If i'm on the right track, is there an elegant way to parameterize this into a function to ensure i select the minimum accounts available in each stratified quantile group depending on the number of cohort required?

Appreciate your help with your methods in R.

# Load required libraries
library(dplyr)
library(lubridate)


# Set the start and end dates
start_date <- as.Date("2023-01-01")
end_date <- as.Date("2023-03-31")

# Number of accounts
num_accounts <- 90


# Function to generate random transactions and sale amounts for each day
generate_random_transactions <- function(date) {
  num_transactions <- sample(1:5, 1)  # Random number of transactions per day (1 to 5)
  sale_amounts <- runif(num_transactions, min = 10, max = 500)  # Random sale amount between 10 and 500
  transactions <- data.frame(date = date, sale_amount = sale_amounts)
  return(transactions)
}
set,seed(1)
# Generate transactions for each day within the specified date range for all accounts
all_transactions <- lapply(seq(start_date, end_date, by = "day"), generate_random_transactions)
all_transactions <- do.call(rbind, all_transactions) %>% tibble

# Assign random account numbers to each transaction
all_transactions$id <- 
  sample(1:num_accounts, nrow(all_transactions), replace = TRUE)

# Display the first few rows of the resulting transactions
all_transactions
all_transactions %>% 
  mutate(quantile = ntile(sale_amount,5)) %>% 
  group_by(quantile) %>% 
  summarise(acct_n=n_distinct(id),
            freq_n = n(),
            amt_mean=mean(sale_amount),
            amt_min=min(sale_amount),
            amt_max=max(sale_amount))
# A tibble: 5 × 6
  quantile acct_n freq_n amt_mean amt_min amt_max
     <int>  <int>  <int>    <dbl>   <dbl>   <dbl>
1        1     40     52     71.8    10.7    128.
2        2     39     52    178.    130.     230.
3        3     40     52    281.    233.     331.
4        4     37     52    376.    332.     420.
5        5     41     51    462.    421.     499.



Aucun commentaire:

Enregistrer un commentaire