So I have a list of lists, say superlist = [sublist1, sublist2, ...,sublistn]
of length n.
I need to randomly pick a large number of combinations of r elements from it, but not necessarily all of them. In my case, n=35, r=8, and the number of combinations needed are 10000.
The problem I am facing while running this on Python 3.9 is that the number of combinations goes to the order of 1e7, and as I am dealing with long lists within each combination, random shuffle is slow and takes up too much memory. So my question is: is there a faster and less memory-intensive way to get a random set of combinations of the format nCr? Ideally, I would want to avoid generating all combinations as well.
What I tried was the following:
import numpy as np
from itertools import combinations as comb
from sklearn.utils import shuffle
largenumber = 10000
all_combs = np.array(list(comb(superlist, r)))
perm = np.arange(all_combs.shape[0])
np.random.shuffle(perm)
if max(perm)<=largenumber:
required_combs = all_combs[perm]
else:
required_combs = all_combs[perm][0:largenumber]
The np.random.shuffle and the comb(superlist,r) become extremely slow and consume large amounts of memory. As I was running it on a cluster, the cluster was killing the process due to high memory usage.
Aucun commentaire:
Enregistrer un commentaire