lundi 23 mars 2020

Correct way to draw random seeds

I would like to store and access a large $(n, k)$ array. The first dimension are items and the second is "universe".

We can say n = 10^7 and k = 10000".

In my use case, I want to access a small number b=32 of items at a time, and the queries are always different.

The twist is that my array is composed of random numbers, following a distribution I know.

Furthermore, I want reproducibility.

Thus, my idea is the following:

Generate random states

import numpy as np

# for reproducibility
np.random.seed(1337)

n = 100
random_states = [np.random.RandomState(np.random.randint(2**32)).get_state() for i in range(n)]

Draw numbers

for i in [23, 12, 42, 26]:
    r = np.random.RandomState()
    r.set_state(random_states[i])
    print(r.random())

The problem is that my random states are quite big in memory.

I could draw random seeds and use then to initialize RandomState, but is it statistically correct?




Aucun commentaire:

Enregistrer un commentaire