jeudi 16 septembre 2021

Resetting R random number generator (rlecuyer) for inner loops using Snow/doSNOW

I have an outer foreach/dopar parallel loop containing an inner loop. Every instance of the inner loop should work on the same set of random numbers. The rest, i.e. the remaining parts of the outer body and the parallel instances should work as usual, i.e. with independent random numbers.

I can achieve this in a non-parallel implementation by saving the state of the RNG before the start of the inner loop and restoring this state after execution of each instance of the inner loop. See the following example:

library(doSNOW)

seed = 4711

cl = makeCluster(2)
registerDoSNOW(cl)
clusterSetupRNGstream (cl, seed=rep(seed,6))

erg = foreach(irun = 1:3,.combine = rbind) %dopar% {

  #do some random stuff in outer loop
  smp = runif(1)

  # save current state of RNG
  s = .Random.seed

  # inner loop, does some more random stuff
  idx = numeric(5)
  for(ii in seq.int(5)) {
    idx[ii] = sample.int(10, 1)
    # reset RNG for next loop iteration
    set.seed(s)
  }

  c(smp,idx)
}

> print(erg)
              [,1] [,2] [,3] [,4] [,5] [,6]
result.1 0.5749162    7    6    2    3    7
result.2 0.1208910    4    3    6    8    9
result.3 0.3491315    7    2    7    6   10

My desired output were constant integers along each row, different from row to row. So this does not work in parallel. The reason is quite clear: snow uses a different random generator and has to deal with parallel streams.

The question is: How can I achieve this reset of seed(s) in snow?

My current work-around is to pre-calculate all random stuff (in the example the idx vector) once for the inner loop and then use this constant data in all inner instances. This is not optimal since the random data in total becomes very large and it is much better to (re)generate it on the fly in smaller chunks.




Aucun commentaire:

Enregistrer un commentaire