lundi 24 avril 2023

Pick some combinations of r elements out of all possible combinations of elements from an array of length n

So I have a list of lists, say superlist = [sublist1, sublist2, ...,sublistn] of length n.

I need to randomly pick a large number of combinations of r elements from it, but not necessarily all of them. In my case, n=35, r=8, and the number of combinations needed are 10000.

The problem I am facing while running this on Python 3.9 is that the number of combinations goes to the order of 1e7, and as I am dealing with long lists within each combination, random shuffle is slow and takes up too much memory. So my question is: is there a faster and less memory-intensive way to get a random set of combinations of the format nCr? Ideally, I would want to avoid generating all combinations as well.

What I tried was the following:

import numpy as np
from itertools import combinations as comb
from sklearn.utils import shuffle
largenumber = 10000
all_combs = np.array(list(comb(superlist, r)))
perm = np.arange(all_combs.shape[0])
np.random.shuffle(perm)
if max(perm)<=largenumber:
   required_combs = all_combs[perm]
else:
   required_combs = all_combs[perm][0:largenumber]

The np.random.shuffle and the comb(superlist,r) become extremely slow and consume large amounts of memory. As I was running it on a cluster, the cluster was killing the process due to high memory usage.




Aucun commentaire:

Enregistrer un commentaire