I want to generate k unique values with numpy.random
, drawn from a uniform distribution between 0 and N (excluding N), where k << N.
At first glance it looks like numpy.random.choice
is the right approach: np.random.choice(N,k,replace=False)
and that works in theory, but there's a gotcha from the docs:
Parameters
a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)
so if N is large, it effectively calls np.arange(N) which creates an unnecessary array and can get very slow.
Is there a way to create a "virtual" large array for numpy to use? I can't figure out if it is straightforward to do so based on what "array-like" means.
Alternatively is there another way to do this using numpy.random but without using np.random.choice?
The "obvious" Python duck-typing approach below works correctly but is even slower (perhaps numpy tries to create a copy first?)
class VirtualArray(object):
def __init__(self, N):
self.N = N
def __len__(self):
return self.N
def __getitem__(self, k):
if 0 <= k < self.N:
return k
raise IndexError('index out of range: %s' % k)
N = 1000000
np.random.seed(123)
a = np.random.choice(VirtualArray(N),20,replace=False)
np.random.seed(123)
b = np.random.choice(N,20,replace=False)
print (a)
print (b)
Aucun commentaire:
Enregistrer un commentaire