mercredi 20 novembre 2019

Best practices for seeding random and numpy.random in the same program

In order to make random simulations we run reproducible later, my colleagues and I often explicitly seed the random or numpy.random modules' random number generators using the random.seed and np.random.seed methods. Seeding with an arbitrary constant like 42 is fine if we're just using one of those modules in a program, but sometimes, we use both random and np.random in the same program. I'm unsure whether there are any best practices I should be following about how to seed the two RNGs together.

In particular, I'm worried that there's some sort of trap we could fall into where the two RNGs together behave in a "non-random" way, such as both generating the exact same sequence of random numbers, or one sequence trailing the other by a few values (e.g. the kth number from random is always the k+20th number from np.random), or the two sequences being related to each other in some other mathematical way. (I realise that pseudo-random number generators are all imperfect simulations of true randomness, but I want to avoid exacerbating this with poor seed choices.)

With this objective in mind, are there any particular ways we should or shouldn't seed the two RNGs? I've used, or seen colleagues use, a few different tactics, like:

  • Using the same arbitrary seed:

    random.seed(42)
    np.random.seed(42)
    
  • Using two different arbitrary seeds:

    random.seed(271828)
    np.random.seed(314159)
    
  • Using a random number from one RNG to seed the other:

    random.seed(42)
    np.random.seed(random.randint(0, 2**32))
    

... and I've never noticed any strange outcomes from any of these approaches... but maybe I've just missed them. Are there any officially blessed approaches to this? And are there any possible traps that I can spot and raise the alarm about in code review?




Aucun commentaire:

Enregistrer un commentaire