mardi 31 octobre 2017

How to generate a binomial vector of n correlated items?

I want to generate binomial vectors based on a number of correlated items each with a defined probability. When I use e. g. rbinom(1e3, size = 4, prob = c(p.x1, p.x2, p.x3, p.x4)) I'm getting something like 3 3 0 0 2 4 1 0 4 4 0 1 4.... Now these x_i have already defined probabilities but are not yet correlated.

Five years ago Josh O'Brien contributed a great approach to generate correlated binomial data. I think it is close to my needs, but it is designed for pairs. I tried to modify the function to produce a greater number of variables but with no success so far and I'm frequently facing

Error in commonprob2sigma(commonprob, simulvals) : 
Matrix commonprob not admissible. 

which is sent by the imported bindata package.

My idea is to define in Josh's function four (or better an arbitrary number of) probabilities and rhos, something like

rmvBinomial3 <- function(n, size, p1, p2, p3, p4, rho) {
  X <- replicate(n, {
    colSums(rmvbin(size, c(p1,p2,p3,p4), bincorr=(1-rho)*diag(4)+rho))
  })
  t(X)
}

Sure--more rhos are needed and I guess a probabililty matrix should be included somehow as it can be done with the bindata package.

rho1 <- -0.89; rho2 <- -0.75; rho3 <- -0.62; rho4 <- -0.59
m <- matrix(c(1, rho1, rho2, rho3,
     rho1, 1, rho4, rho2,
     rho2, rho4, 1, rho1,
     rho3, rho2, rho1, 1), ncol = 4) 
#       [,1]  [,2]  [,3]  [,4]
# [1,]  1.00 -0.89 -0.75 -0.62
# [2,] -0.89  1.00 -0.59 -0.75
# [3,] -0.75 -0.59  1.00 -0.89
# [4,] -0.62 -0.75 -0.89  1.00

Unfortunately each matrix I check with bindata::check.commonprob(m) it throws me the same error as above. I also couldn't accomplish to let bindata::commonprob2sigma() create a matrix.

A drawback is the range of rmvBinomial(), it seems to work only between values for p.X_i= 0.2--0.8 something and I need smaller values e.g. 0.01--0.1, too.

It seems I'm really stuck. Hopefully anybody could help and show me how to do this?

Edit: To clarify, the expected outcome is indeed just one single vector 3 3 0 0 2 4 1 0 4 4 0 1 4... as shown in the beginning, but the items from which it's derived should be correlated to a definable degree (i. e. one of the items could have no correlation at all).




Aucun commentaire:

Enregistrer un commentaire