jeudi 25 juin 2015

Making functions that set the random seed independent

Sometimes I want to write a randomized function that always returns the same output for a particular input. I've always implemented this by setting the random seed at the top of the function and then proceeding. Consider two functions defined in this way:

sample.12 <- function(size) {
  set.seed(144)
  sample(1:2, size, replace=TRUE)
}
rand.prod <- function(x) {
  set.seed(144)
  runif(length(x)) * x
}

sample.12 returns a vector of the specified size randomly sampled from the set {1, 2} and rand.prod multiplies each element of a specified vector by a random value uniformly selected from [0, 1]. Normally I would expect x <- sample.12(10000) ; rand.prod(x) to have a "step" distribution with pdf 2/3 in the range [0, 1] and 1/3 in the range [1, 2], but due to my unfortunate choice of identical random seeds above I see a different result:

x <- sample.12(10000)
hist(rand.prod(x))

enter image description here

I can fix this issue in this case by changing the random seed in one of the functions to some other value. For instance, with set.seed(10000) in rand.prod I get the expected distribution:

enter image description here

Previously on SO this solution of using different seeds has been accepted as the best approach to generate independent random number streams. However, I find the solution to be unsatisfying because streams with different seeds could be related to one another (possibly even highly related to one another); in fact, they might even yield identical streams according to ?set.seed:

There is no guarantee that different values of seed will seed the RNG differently, although any exceptions would be extremely rare.

Is there a way to implement a pair of randomized functions in R that:

  1. Always return the same output for a particular input, and
  2. Enforce independence between their sources of randomness by more than just using different random seeds?



Aucun commentaire:

Enregistrer un commentaire