jeudi 5 novembre 2020

Generating k values with numpy.random between 0 and N without replacement [duplicate]

I want to generate k unique values with numpy.random, drawn from a uniform distribution between 0 and N (excluding N), where k << N.

At first glance it looks like numpy.random.choice is the right approach: np.random.choice(N,k,replace=False) and that works in theory, but there's a gotcha from the docs:

Parameters

a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

so if N is large, it effectively calls np.arange(N) which creates an unnecessary array and can get very slow.

Is there a way to create a "virtual" large array for numpy to use? I can't figure out if it is straightforward to do so based on what "array-like" means.

Alternatively is there another way to do this using numpy.random but without using np.random.choice?

The "obvious" Python duck-typing approach below works correctly but is even slower (perhaps numpy tries to create a copy first?)

class VirtualArray(object):
    def __init__(self, N):
        self.N = N
    def __len__(self):
        return self.N
    def __getitem__(self, k):
        if 0 <= k < self.N:
            return k
        raise IndexError('index out of range: %s' % k)

N = 1000000
np.random.seed(123)    
a = np.random.choice(VirtualArray(N),20,replace=False)
np.random.seed(123)    
b = np.random.choice(N,20,replace=False)
print (a)
print (b)



Aucun commentaire:

Enregistrer un commentaire