This is an extended question I found from here (Method #1: http://santiago.begueria.es/2010/10/generating-spatially-correlated-random-fields-with-r/) and here (Method #2: https://gist.github.com/brentp/1306786). I know these two sites covered very well (Thanks!) with relatively small size of dimension (e.g., 1000x1). I am trying to generate spatially clustered binary data with large size of dimension like >=100000x1 dimension, for example, c(1,1,1,1,0,1,0,0,0,0, …, 0,0,0,0,0,0,0,0,0,0,0,0) with 1000 times / case study. Here are slightly modified codes from the sties.
# Method #1
dim1 <- 1000
dim2 <- 1
xy <- expand.grid(seq_len(dim1), seq_len(dim2))
colnames(xy) <- c("x", "y")
geo.model <- gstat(formula = z~x+y, locations = ~x+y, dummy = TRUE, beta = 0,
model = vgm(psill = 1,"Exp",
range = dim1), # Range parameter!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
nmax = 30) # Spatial correlation model
sim.mat <- predict(geo.model, newdata = xy, nsim = 1)
sim.mat[,3] <- ifelse(sim.mat[,3] > quantile(sim.mat[,3], .1), 0, 1)
plot(sim.mat[, 3])
# Method #2
# generate autocorrelated data.
nLags = 1000 # number of lags (size of region)
# fake, uncorrelated observations
X = rnorm(nLags)
# fake sigma... correlated decreases distance.
sigma = diag(nLags)
corr = .999
sigma <- corr ^ abs(row(sigma)-col(sigma))
#sigma
# Y is autocorrelated...
Y <- t(X %*% chol(sigma))
y <- ifelse(Y >= quantile(Y, probs=.9), 1, 0)[, 1]
plot(y)
Again, two methods work very well to generate binary data when dim1 is less than 10000. However, when I tried several hundred thousand (e.g., >= 100,000), it seems to take a long time or memory issue.
For example, when I used “nLags = 50000” in Method #2, I got an error message (“Error: cannot allocate vector of size 9.3 Gb”) after the code “sigma <- corr ^ abs(row(sigma)-col(sigma))”.
I would like to find an efficient (time- and memory-saving) way(s) to generate such a spatially clustered binary data 1000 times (especially, with dim1 >= 100000) per each case study (about 200 cases).
I have thought about applying multiple probabilities in "sample" function or probability distribution. I am not sure how to and beyond my scope.
Thanks in advance,
Aucun commentaire:
Enregistrer un commentaire