lundi 3 juillet 2017

Generate random values in R with a defined correlation in a defined range

For a science project, I am looking for a way to generate random data in a certain range (e.g. min=0, max=100000) with a certain correlation with another variable which already exists in R. The goal is to enrich the dataset a little so I can produce some more meaningful graphs (no worries, I am working with fictional data).

For example, I want to generate random values correlating with r=-.78 with the following data:

var1 <- rnorm(100, 50, 10)

I already came across some pretty good solutions (i.e. http://ift.tt/2tDOgtt), but only get very small values, which I cannot transform so the make sense in the context of the other, original values.

Following the example:

var1 <- rnorm(100, 50, 10)
n     <- length(var1)                   
rho   <- -0.78                   
theta <- acos(rho)             
x1    <- var1      
x2    <- rnorm(n, 50, 50)      
X     <- cbind(x1, x2)         
Xctr  <- scale(X, center=TRUE, scale=FALSE)   

Id   <- diag(n)                               
Q    <- qr.Q(qr(Xctr[ , 1, drop=FALSE]))       
P    <- tcrossprod(Q)          # = Q Q'       
x2o  <- (Id-P) %*% Xctr[ , 2]                 
Xc2  <- cbind(Xctr[ , 1], x2o)                
Y    <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))  
var2 <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1]    
cor(var1, var2)  

What I get for var2 are values ranging between -0.5 and 0.5. with a mean of 0. I would like to have much more distributed data, so I could simply transform it by adding 50 and have a quite simililar range compared to my first variable.

Does anyone of you know a way to generate this kind of - more or less -meaningful data?

Thanks a lot in advance!




Aucun commentaire:

Enregistrer un commentaire