Sometimes I want to write a randomized function that always returns the same output for a particular input. I've always implemented this by setting the random seed at the top of the function and then proceeding. Consider two functions defined in this way:
sample.12 <- function(size) {
set.seed(144)
sample(1:2, size, replace=TRUE)
}
rand.prod <- function(x) {
set.seed(144)
runif(length(x)) * x
}
sample.12
returns a vector of the specified size randomly sampled from the set {1, 2}
and rand.prod
multiplies each element of a specified vector by a random value uniformly selected from [0, 1]
. Normally I would expect x <- sample.12(10000) ; rand.prod(x)
to have a "step" distribution with pdf 2/3 in the range [0, 1]
and 1/3 in the range [1, 2]
, but due to my unfortunate choice of identical random seeds above I see a different result:
x <- sample.12(10000)
hist(rand.prod(x))
I can fix this issue in this case by changing the random seed in one of the functions to some other value. For instance, with set.seed(10000)
in rand.prod
I get the expected distribution:
Previously on SO this solution of using different seeds has been accepted as the best approach to generate independent random number streams. However, I find the solution to be unsatisfying because streams with different seeds could be related to one another (possibly even highly related to one another); in fact, they might even yield identical streams according to ?set.seed
:
There is no guarantee that different values of seed will seed the RNG differently, although any exceptions would be extremely rare.
Is there a way to implement a pair of randomized functions in R that:
- Always return the same output for a particular input, and
- Enforce independence between their sources of randomness by more than just using different random seeds?
Aucun commentaire:
Enregistrer un commentaire