I would like to store and access a large $(n, k)$ array. The first dimension are items and the second is "universe".
We can say n = 10^7 and k = 10000".
In my use case, I want to access a small number b=32 of items at a time, and the queries are always different.
The twist is that my array is composed of random numbers, following a distribution I know.
Furthermore, I want reproducibility.
Thus, my idea is the following:
Generate random states
import numpy as np
# for reproducibility
np.random.seed(1337)
n = 100
random_states = [np.random.RandomState(np.random.randint(2**32)).get_state() for i in range(n)]
Draw numbers
for i in [23, 12, 42, 26]:
r = np.random.RandomState()
r.set_state(random_states[i])
print(r.random())
The problem is that my random states are quite big in memory.
I could draw random seeds and use then to initialize RandomState
, but is it statistically correct?
Aucun commentaire:
Enregistrer un commentaire