I am currently using randomness in functions and unit-tests. Moreover sometimes the functions should support joblib
parallelism. I am wondering what are the issues with using numpy.random.seed
vs Generator
?
For example say we have the Generator pattern:
# pseudocode
def do_something_with_seed(rng):
rng = np.random.default_rng(rng)
# you can just call as is
do_something_with_seed(12345)
# I need to generate a seed sequence as I understand, when using parallelism
Parallel(do_something_with_seed(_rng) for _rng in rng.spawn(n_jobs))
Next, say we use the np.random.seed
pattern
# pseudocode
def do_something_without_seed(seed):
np.random.seed(seed)
...
# this requires you to always set the global seed before running this function
do_something_with_global_seed(12345)
# when using parallelism
random_seeds = np.random.randint(np.iinfo(np.int32).max, size=len(seeds))
Parallel(do_something_with_global_seed(seed) for seed in random_seeds)
As I can see it, the performance, functionality is the same as long as you remember to do things properly. Is there any differences or reasons that we for sure need/want to use the Generator
pattern? What about for unit-testing?
Aucun commentaire:
Enregistrer un commentaire