mardi 30 mai 2017

Generating "better" random data

I'm trying to build an example chart using ggplot. The context is marketing data and I created some dummy data with these blocks:

# dimensions
channels <- c("Facebook", "Youtube", "SEM", "Organic", "Email", "Direct")
last_month <- Sys.Date() %m+% months(-1) %>% floor_date("month")
mts <- seq(from = last_month %m+% months(-23), to = last_month, by = "1 month") %>% format("%b-%Y")
dimvars <- expand.grid(Month = mts, Channel = channels)

# metrics
rws <- nrow(dimvars)
set.seed(123)
Sessions <- ceiling(rnorm(rws, mean = 3000, sd = 300))
Transactions <- ceiling(rnorm(rws, mean = 200, sd = 40))
Revenue <- ceiling(rnorm(rws, mean = 10000, sd = 100))

# make df
dataset <- cbind(dimvars, Sessions, Transactions, Revenue)

I then build an area plot:

timeline <- ggplot(dataset, aes(x = Month, y = Sessions,fill = Channel, group = Channel)) +
  geom_area(alpha = 0.8) +
  theme(axis.text.x=element_text(angle=90, hjust=1))

Here's how it looks: enter image description here

This image is pretty unrealistic from the standpoint of a marketer, where all channels are moving in line with each other. Short of actually adding in manual values, I wondered if there are any techniques for adding multipliers or similar to "random" data?

For example, without resorting to adding manual values, I'd like to know if I can set the rnorm() to start with a lower mean, then grow, then shrink again along the n values generated. Is there a function that does this? I could create 3 or 4 distributions with different means and then c() them but this would look more like sharp changes rather than gradual ebbs and flows.

Any suggestions for manipulating random data to fluctuate (expand and contract) over the course of n length vector?

I tried fiddling with the sd argument or rnorm but that just made the data look more volatile.

I'm trying to show, somewhat randomly, channels start and ramp up, then level off and decline.




Aucun commentaire:

Enregistrer un commentaire