lundi 12 août 2019

C++ builtin random artifacts in Rcpp

I'm maintaining an R package named iRF, and a big problem is that it isn't reproducible. In other words, I cannot get the same result by setting set.seed. For the purpose of this question, let's focus on the function RIT. You don't need to figure out what it does; just look at the RNG handling part instead.

It is defined in R/RIT.R, which calls either RIT_1class or RIT_2class depending on the input type. Both RIT_[1|2]class functions are defined in src/ExportedFunctionsRIT.cpp, which in turn calls helper functions defined in src/RITmain.h and src/RITaux.h.

I'm using Rcpp attributes, so randomness in RIT_[1|2]class should be correctly handled by an implicit RNGScope, as mentioned in this answer. However, this codebase is tricky to tackle in two ways,

  1. The functions RIT_basic and RIT_minhash use // [[Rcpp::plugins(openmp)]]. Fortunately, the original author gives each thread a separate seed, so hopefully, I can make it deterministic with seeds[i] = rand() * (i+1), yet you can tell this along isn't enough since I'm asking here.
// Set up vector of seeds for RNG
vector<unsigned int> seeds(n_cores);
for (int i=0; i<n_cores; i++) {
  seeds[i] = chrono::high_resolution_clock::now().time_since_epoch().count()*(i+1);
}

  1. One of the functions, CreateHT uses random_device rd;. I'm not familiar with C++ but a quick search reveals it generates "non-deterministic random numbers".
void CreateHt(...) {
  // Ht is p by L
  random_device rd; //seed for Random Number Generator(RNG)
  mt19937_64 mt(rd()); //Use Mersenne Twister as RNG

  ...
    shuffle(perm.begin(), perm.end(), mt);
  ...
}

From my understanding, both rand() and random_device are C++'s builtin random artifacts. How can I make them respect .Random.seed?




Aucun commentaire:

Enregistrer un commentaire