I'm trying to write a parallelized function using foreach
and the doSNOW
package. Actually, I am used to the parallel
package, but I cannot get utils::txtProgressBar
to work with it.
Inside a bigfun
where—for sake of reproducibility—the user may set a seed in the arguments, there is a small fun
performing a stochastic process. AFAIK we use parallel::clusterSetRNGStream
to send to the workers separate streams of the seed vector, fixed by the iseed=
.
library(doSNOW)
library(parallel)
CL <- makeSOCKcluster(detectCores() - 1)
registerDoSNOW(CL)
fun <- function() foreach(i=1:20, .combine='c') %dopar% rnorm(1L)
bigfun <- function(seed) {
clusterSetRNGStream(CL, iseed=seed)
fun()
}
For some reason, however, the seeds seem to decay after a few iterations (eight in the following example) and the results diverge.
r1 <- replicate(5, bigfun(42))
r1
# [,1] [,2] [,3] [,4] [,5]
# [1,] -0.939077079 -0.939077079 -0.939077079 -0.939077079 -0.939077079
# [2,] 1.119328457 1.119328457 1.119328457 1.119328457 1.119328457
# [3,] -0.208480914 -0.208480914 -0.208480914 -0.208480914 -0.208480914
# [4,] 0.001100034 0.001100034 0.001100034 0.001100034 0.001100034
# [5,] 0.226260465 0.226260465 0.226260465 0.226260465 0.226260465
# [6,] 0.858422030 0.858422030 0.858422030 0.858422030 0.858422030
# [7,] -1.137862056 -1.137862056 -1.137862056 -1.137862056 -1.137862056
# [8,] -0.041679428 -0.041679428 -0.041679428 -0.041679428 -0.041679428
# [9,] 0.829413493 0.829413493 0.829413493 -0.076171414 -0.076171414
# [10,] -0.439358199 -0.439358199 -0.439358199 0.829413493 0.829413493
# [11,] -0.314035435 -0.314035435 -0.314035435 -1.034149295 -0.350219125
# [12,] -2.129023630 -2.129023630 -2.129023630 -0.439358199 -0.439358199
# [13,] 2.506922433 2.506922433 2.506922433 -0.350219125 -0.314035435
# [14,] -1.127312751 -1.127312751 -1.127312751 -0.262905962 -2.129023630
# [15,] 0.166082706 0.166082706 0.166082706 -0.314035435 -0.334911609
# [16,] 0.576723182 0.576723182 0.576723182 -0.334911609 2.506922433
# [17,] -0.076171414 -1.905145916 -1.905145916 0.388011532 -1.733112797
# [18,] -1.905145916 0.923505402 0.923505402 -2.129023630 -1.127312751
# [19,] -1.034149295 -0.441691024 -0.441691024 -1.733112797 0.036285344
# [20,] 0.923505402 0.149157109 0.149157109 0.833106718 0.166082706
You will probably get slightly different results. If you don't trust your eyes, here are the variances:
apply(r1, 1, var)
# [1] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
# [7] 0.0000000000 0.0000000000 0.6609707033 1.2262257376 0.1907860246 0.0125808072
# [13] 0.0294428629 0.0004667477 0.0000000000 1.4505813394 0.0910200157 2.0143861427
# [19] 3.1746991914 1.3669724716
Using parSapply
the same actually works fine:
fun2 <- function() parSapply(CL, 1:20, \(i) rnorm(1L))
bigfun2 <- function(seed) {
clusterSetRNGStream(CL, iseed=seed)
fun2()
}
r2 <- replicate(5, bigfun2(42))
apply(r2, 1, var)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
stopCluster(CL)
Should I rather do without doSNOW
now or is there anything I might have missed?
And here comes the session info:
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doSNOW_1.0.20 snow_0.4-4 iterators_1.0.14 foreach_1.5.2
loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0 codetools_0.2-18
Aucun commentaire:
Enregistrer un commentaire