jeudi 23 novembre 2023

Correctly seeding numpy random generator

For my scientific experiments, I usually seed using:

rng = np.random.Generator(np.random.PCG64(seed))

which for the current numpy version is equivalent to

rng = np.random.Generator(np.random.default_rng(seed))

As I repeat my experiments n times and average their results, I usually set the seed to all the numbers between 0 and n.

However, reading the documentations here and here it states that

Seeds should be large positive integers.

or

We default to using a 128-bit integer using entropy gathered from the OS. This is a good amount of entropy to initialize all of the generators that we have in numpy. We do not recommend using small seeds below 32 bits for general use.

However, in the second reference, it also states

There will not be anything wrong with the results, per se; even a seed of 0 is perfectly fine thanks to the processing that SeedSequence does.

This feels contradictory and I wonder, if small seeds are now totally fine to use, or one should move towards higher seeds. Especially, I wonder, (i) at which point (if any) would a large seed make a difference to a low seed and (ii) if one does scientific experiments (e.g. machine learning / algorithmic research) should one prefer higher to lower seeds or should it not make a difference?

PS: This question is highly related to Random number seed in numpy but concerns the now recommended Generator. Furthermore, the answer seems not in-depth enough as it does not include a discussion about high and low seeds.




Aucun commentaire:

Enregistrer un commentaire