I am trying to generate multiple data frames of correlated data with a specified number of observations (say 1000), randomly sample m observations from the data frame (subset the data frame to m rows), calculate the P value of the correlation coefficient of two of the variables in for a number of different data frames, andthe data frame using cor.test(), and repeat this z number of times (say 100) and determine the number of P values less than or equal to .05. When I do this using the following code I get the same P value.
Astutely, it was pointed out to me the reason I am gettign the same result is that I an specifying the same seed. I have searched for an answer and tried to fix the code, But I can't seem to correct the problem. How do I generate random data and not specify the seed? Or is there another way to fix the code I can't see? Thank you.
corr_data <- function(seed, obs, n) {
R <- matrix(cbind(1,.80,.2, .80,1,.7, .2,.7,1), nrow=3)
U <- t(chol(R))
nvars <- dim(U)[1]
numobs <- obs
set.seed(seed)
random.normal <- matrix(rnorm(nvars*numobs,0,1), nrow=nvars, ncol=numobs)
X <- U %*% random.normal
newX <- t(X)
raw <- as.data.frame(newX)
names(raw) <- c("response","predictor1","predictor2")
sample <- raw[sample(nrow(raw), n, replace=TRUE), ]
return(sample)
}
set.seed(758936309)
p_cor_test <- replicate(10,
cor.test(corr_data(03301965, 1000, 100)$predictor1,
corr_data(03301965, 1000, 100)$predictor2)$p.value
)
p_cor_test
## [1] 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23
## [7] 3.25918e-23 3.25918e-23 3.25918e-23 3.25918e-23
Aucun commentaire:
Enregistrer un commentaire