dimanche 4 septembre 2016

Generating Random Data using Replicate

I am trying to generate multiple data frames of correlated data with a specified number of observations (say 1000), randomly sample m observations from the data frame (subset the data frame to m rows), calculate the P value of the correlation coefficient of two of the variables in for a number of different data frames, andthe data frame using cor.test(), and repeat this z number of times (say 100) and determine the number of P values less than or equal to .05. When I do this using the following code I get the same P value.

Astutely, it was pointed out to me the reason I am gettign the same result is that I an specifying the same seed. I have searched for an answer and tried to fix the code, But I can't seem to correct the problem. How do I generate random data and not specify the seed? Or is there another way to fix the code I can't see? Thank you.

corr_data <- function(seed, obs, n) {
                                      R             <- matrix(cbind(1,.80,.2,  .80,1,.7,  .2,.7,1), nrow=3)
                                      U             <- t(chol(R))
                                      nvars         <- dim(U)[1]
                                      numobs        <- obs
                                      set.seed(seed)
                                      random.normal <- matrix(rnorm(nvars*numobs,0,1), nrow=nvars, ncol=numobs)
                                      X             <- U %*% random.normal
                                      newX          <- t(X)
                                      raw           <- as.data.frame(newX)
                                      names(raw)    <- c("response","predictor1","predictor2")
                                      sample <- raw[sample(nrow(raw), n, replace=TRUE), ]                                     
                                      return(sample)
                                    }

set.seed(758936309)
p_cor_test   <- replicate(10,
                             cor.test(corr_data(03301965, 1000, 100)$predictor1,
                             corr_data(03301965, 1000, 100)$predictor2)$p.value
                         )
p_cor_test

## [1] 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23
## [7] 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23




Aucun commentaire:

Enregistrer un commentaire