dimanche 3 juillet 2022

How to get reproducible results using a seed in doSNOW?

I'm trying to write a parallelized function using foreach and the doSNOW package. Actually, I am used to the parallel package, but I cannot get utils::txtProgressBar to work with it.

Inside a bigfun where—for sake of reproducibility—the user may set a seed in the arguments, there is a small fun performing a stochastic process. AFAIK we use parallel::clusterSetRNGStream to send to the workers separate streams of the seed vector, fixed by the iseed=.

library(doSNOW)
library(parallel)

CL <- makeSOCKcluster(detectCores() - 1)
registerDoSNOW(CL)

fun <- function() foreach(i=1:20, .combine='c') %dopar% rnorm(1L)

bigfun <- function(seed) {
  clusterSetRNGStream(CL, iseed=seed)
  fun()
}

For some reason, however, the seeds seem to decay after a few iterations (eight in the following example) and the results diverge.

r1 <- replicate(5, bigfun(42))

r1
#               [,1]         [,2]         [,3]         [,4]         [,5]
#  [1,] -0.939077079 -0.939077079 -0.939077079 -0.939077079 -0.939077079
#  [2,]  1.119328457  1.119328457  1.119328457  1.119328457  1.119328457
#  [3,] -0.208480914 -0.208480914 -0.208480914 -0.208480914 -0.208480914
#  [4,]  0.001100034  0.001100034  0.001100034  0.001100034  0.001100034
#  [5,]  0.226260465  0.226260465  0.226260465  0.226260465  0.226260465
#  [6,]  0.858422030  0.858422030  0.858422030  0.858422030  0.858422030
#  [7,] -1.137862056 -1.137862056 -1.137862056 -1.137862056 -1.137862056
#  [8,] -0.041679428 -0.041679428 -0.041679428 -0.041679428 -0.041679428
#  [9,]  0.829413493  0.829413493  0.829413493 -0.076171414 -0.076171414
# [10,] -0.439358199 -0.439358199 -0.439358199  0.829413493  0.829413493
# [11,] -0.314035435 -0.314035435 -0.314035435 -1.034149295 -0.350219125
# [12,] -2.129023630 -2.129023630 -2.129023630 -0.439358199 -0.439358199
# [13,]  2.506922433  2.506922433  2.506922433 -0.350219125 -0.314035435
# [14,] -1.127312751 -1.127312751 -1.127312751 -0.262905962 -2.129023630
# [15,]  0.166082706  0.166082706  0.166082706 -0.314035435 -0.334911609
# [16,]  0.576723182  0.576723182  0.576723182 -0.334911609  2.506922433
# [17,] -0.076171414 -1.905145916 -1.905145916  0.388011532 -1.733112797
# [18,] -1.905145916  0.923505402  0.923505402 -2.129023630 -1.127312751
# [19,] -1.034149295 -0.441691024 -0.441691024 -1.733112797  0.036285344
# [20,]  0.923505402  0.149157109  0.149157109  0.833106718  0.166082706

You will probably get slightly different results. If you don't trust your eyes, here are the variances:

apply(r1, 1, var)
#  [1] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
#  [7] 0.0000000000 0.0000000000 0.6609707033 1.2262257376 0.1907860246 0.0125808072
# [13] 0.0294428629 0.0004667477 0.0000000000 1.4505813394 0.0910200157 2.0143861427
# [19] 3.1746991914 1.3669724716

Using parSapply the same actually works fine:

fun2 <- function() parSapply(CL, 1:20, \(i) rnorm(1L))

bigfun2 <- function(seed) {
  clusterSetRNGStream(CL, iseed=seed)
  fun2()
}

r2 <- replicate(5, bigfun2(42))

apply(r2, 1, var)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


stopCluster(CL)

Should I rather do without doSNOW now or is there anything I might have missed?


And here comes the session info:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doSNOW_1.0.20    snow_0.4-4       iterators_1.0.14 foreach_1.5.2   

loaded via a namespace (and not attached):
[1] compiler_4.2.0   tools_4.2.0      codetools_0.2-18



Aucun commentaire:

Enregistrer un commentaire