jeudi 19 novembre 2020

Generate a random distribution by group conditional on a column

I want to generate two different distributions conditional on a column. For example, here I am generating a normal distribution rnorm() if z1 is above 25 and a Poisson rpois() otherwise. Additionally, I would like to get variation by groups(column id) from the stated distribution.

For now I have the following code:

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
                      4L, 4L), z1 = c(21L, 21L, 21L, 28L, 28L, 28L, 30L, 30L, 30L, 
                                      20L, 20L, 20L)), row.names = c(NA, -12L), class = "data.frame")  
  
df$sample  <- with(df, ifelse(z1 > 25, 
                         rnorm(n = 1,mean = 0,sd = 1), ##Normal(0,1)
                         rpois(n = 1,lambda = 5)))     ## Poisson(5) 

  # id z1     sample
  # 1   1 21  6.0000000
  # 2   1 21  6.0000000
  # 3   1 21  6.0000000
  # 4   2 28 -0.8036847
  # 5   2 28 -0.8036847
  # 6   2 28 -0.8036847
  # 7   3 30 -0.8036847
  # 8   3 30 -0.8036847
  # 9   3 30 -0.8036847
  # 10  4 20  6.0000000
  # 11  4 20  6.0000000
  # 12  4 20  6.0000000

Unfortunately, as you can see above I do not get variation within groups of ids (column id). Below is my desired output in the column desired_sample.

  
  #     id z1     sample     desired_sample
  # 1   1 21  6.0000000  5.0000000
  # 2   1 21  6.0000000  5.0000000
  # 3   1 21  6.0000000  5.0000000
  # 4   2 28 -0.8036847  0.7356226
  # 5   2 28 -0.8036847  0.7356226
  # 6   2 28 -0.8036847  0.7356226
  # 7   3 30 -0.8036847 -1.359669
  # 8   3 30 -0.8036847 -1.359669
  # 9   3 30 -0.8036847 -1.359669
  # 10  4 20  6.0000000  4.0000000
  # 11  4 20  6.0000000  4.0000000
  # 12  4 20  6.0000000  4.0000000

[Follow up]

The following code does it, but...

con_dist2 <- function(x){
  ifelse( x>=25,
          return(rnorm(1,mean = 0 , sd = 1 )),
          return(rpois(1,lambda = 5 )))
}

df$desired_sample2<- with(df ,ave(x = z1, id, FUN = con_dist2), )

... is there any way to include the threshold value (25) as a function con_dist2 input to make it more flexible and reusable?




Aucun commentaire:

Enregistrer un commentaire