vendredi 25 octobre 2019

numpy list of random subsets of specified size

I want to efficiently (time and space) compute a list of random subset of an array, where the ordering in the subsets is random.

For example the array [1,2,3,4,5] with some parameters k=3 and n=5 should produce a matrix

[[4,3,1],
 [1,2,5],
 [2,4,5],
 [3,2,5],
 [2,3,1]]

i.e. we get n=5 lists with k=3 random unique elements from the array.

I would like this to be as fast as possible, without creating a gigantic lookup table of possible combinations, since both time and space are of the essence.

I tried to do this with numpy.random.choice(array,n*k).reshape((n,k)) which almost does, what i want, except for the uniqueness part. I settled on the following

subsets = numpy.zeros(n).reshape((n,1))
subsets = numpy.apply_along_axis(lambda x : numpy.random.choice(array,k,replace=False),1,subsets)

However since this is not purely numpy, this is slow. Too slow for my application. Is there some way to improve on the runtime, and maybe achieve this with pure numpy commands?

Any help would be appreciated. Thanks.




Aucun commentaire:

Enregistrer un commentaire