I want to generate two different distributions conditional on a column. For example, here I am generating a normal distribution rnorm()
if z1
is above 25 and a Poisson rpois()
otherwise. Additionally, I would like to get variation by groups(column id
) from the stated distribution.
For now I have the following code:
df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), z1 = c(21L, 21L, 21L, 28L, 28L, 28L, 30L, 30L, 30L,
20L, 20L, 20L)), row.names = c(NA, -12L), class = "data.frame")
df$sample <- with(df, ifelse(z1 > 25,
rnorm(n = 1,mean = 0,sd = 1), ##Normal(0,1)
rpois(n = 1,lambda = 5))) ## Poisson(5)
# id z1 sample
# 1 1 21 6.0000000
# 2 1 21 6.0000000
# 3 1 21 6.0000000
# 4 2 28 -0.8036847
# 5 2 28 -0.8036847
# 6 2 28 -0.8036847
# 7 3 30 -0.8036847
# 8 3 30 -0.8036847
# 9 3 30 -0.8036847
# 10 4 20 6.0000000
# 11 4 20 6.0000000
# 12 4 20 6.0000000
Unfortunately, as you can see above I do not get variation within groups of ids (column id
). Below is my desired output in the column desired_sample
.
# id z1 sample desired_sample
# 1 1 21 6.0000000 5.0000000
# 2 1 21 6.0000000 5.0000000
# 3 1 21 6.0000000 5.0000000
# 4 2 28 -0.8036847 0.7356226
# 5 2 28 -0.8036847 0.7356226
# 6 2 28 -0.8036847 0.7356226
# 7 3 30 -0.8036847 -1.359669
# 8 3 30 -0.8036847 -1.359669
# 9 3 30 -0.8036847 -1.359669
# 10 4 20 6.0000000 4.0000000
# 11 4 20 6.0000000 4.0000000
# 12 4 20 6.0000000 4.0000000
[Follow up]
The following code does it, but...
con_dist2 <- function(x){
ifelse( x>=25,
return(rnorm(1,mean = 0 , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = con_dist2), )
... is there any way to include the threshold value (25
) as a function con_dist2
input to make it more flexible and reusable?
Aucun commentaire:
Enregistrer un commentaire