lundi 10 juillet 2023

Python Global random seed vs Numpy Generator

I am currently using randomness in functions and unit-tests. Moreover sometimes the functions should support joblib parallelism. I am wondering what are the issues with using numpy.random.seed vs Generator?

For example say we have the Generator pattern:

# pseudocode
def do_something_with_seed(rng):
    rng = np.random.default_rng(rng)

# you can just call as is
do_something_with_seed(12345)

# I need to generate a seed sequence as I understand, when using parallelism
Parallel(do_something_with_seed(_rng) for _rng in rng.spawn(n_jobs))

Next, say we use the np.random.seed pattern

# pseudocode
def do_something_without_seed(seed):
    np.random.seed(seed)
    ...

# this requires you to always set the global seed before running this function
do_something_with_global_seed(12345)

# when using parallelism
random_seeds = np.random.randint(np.iinfo(np.int32).max, size=len(seeds))
Parallel(do_something_with_global_seed(seed) for seed in random_seeds)

As I can see it, the performance, functionality is the same as long as you remember to do things properly. Is there any differences or reasons that we for sure need/want to use the Generator pattern? What about for unit-testing?




Aucun commentaire:

Enregistrer un commentaire