vendredi 20 novembre 2020

Seed setting: why is the output different after no change in input

Setting a seed ensures reproducibility and is important in simulation modelling. Consider a simple model f() with two variables y1 and y2 of interest. The outputs of these variables are determined by a random process (rbinom()) and the parameters x1 and x2. The outputs of the two variables of interest are independent of each other.

Now say we want to compare the change in the output of a variable after a change in the respective parameter has occurred with a scenario before the change was made (i.e. sensitivity analysis). If all other parameters have not been changed and the same seed was set, shouldn't the output of the unaffected variable remain the same as it is in the default simulation since this variable is independent of the other?

In short, why is the below output of variable y2 determined by parameter x2 changing after only a change in x1 occurs despite constant seed being set? One could just ignore the output of y2 and focus only on y1, but in a larger simulation where each variable is a cost component of the total cost the change in an unaffected variable may become problematic when testing the overall sensitivity of a model after individual parameter changes have been made.

#~ parameters and model

x1 <- 0.0
x2 <- 0.5
n  <- 10
ts <- 5

f <- function(){
  out <- data.frame(step = rep(0, n),
                    space = 1:n,
                    id = 1:n,
                    y1 = rep(1, n),
                    y2 = rep(0, n))
  
  l.out <- vector(mode = "list", length = n)
  
  for(i in 1:ts){
    out$step <- i
    out$y1[out$y1 == 0] <- 1
    out$id[out$y2 == 1]  <- seq_along(which(out$y2 == 1)) + n
    out$y2[out$y2 == 1] <- 0
    
    out$y1 <- rbinom(nrow(out), 1, 1-x1)
    out$y2 <- rbinom(nrow(out), 1, x2)
    
    n  <- max(out$id)
    l.out[[i]] <- out
  }
do.call(rbind, l.out)
}

#~ Simulation 1 (default)
set.seed(1)
run1 <- f()
set.seed(1)
run2 <- f()
run1 == run3 #~ all observations true as expected

#~ Simulation 2
#~ change in x1 parameter affecting only variable y1
x1 <- 0.25
set.seed(1)
run3 <- f()
set.seed(1)
run4 <- f()
run3 == run4 #~ all observations true as expected

#~ compare variables after change in x1 has occured
run1$y1 == run3$y1  #~ observations differ as expected
run1$y2 == run3$y2  #~ observations differ - why?



Aucun commentaire:

Enregistrer un commentaire