What I am doing at the moment:
import numpy as np
d = 60000
n_choice = 1000
n_samples = 5000
# some probability vector
p = np.random.randint(d)
p = p/np.sum(p)
rng = np.default_rng(123)
samples = np.empty((n_choice, n_samples))
for i in range(n_samples):
samples[:, i] = rng.choice(d, size=n_choice, replace=False, p=p, shuffle=False)
This is a bit slow for my taste. Is there a way to speed this up? E.g., by replacing the loop with a trick or using some other form of simulation?
I skimmed through similar questions on stack, but only found this where the weights are uniform and d=n_choice
and this where weights are given but only the rows (columns) of the samples array have to be unique.
Aucun commentaire:
Enregistrer un commentaire