Lets say I have a very large list. I am generating combinations for this list using itertools. What I want to achieve is to select the elements at random, without materializing the generator (it is too big even for 16 GB of RAM, I learnt that the hard way).
Here, is what I have done so far:
import itertools
import random
myList = ['abc', 'bcd', 'cdef', 'adcv', 'zofd', 'qmkdf', 'qmk', 'oswd']
# much larger than above in my case
myCombi_gen = itertools.combinations(myList, r=2)
# Above should generate: [('abc', 'bcd'), ('abc', 'cdef'), .....]
N = 15
counter = 0
for elem in myCombi_gen:
if random.random() > 0.5: # selecting based on probability
print(elem, end=' ')
counter+=1
# doing something useful here
if counter == N:
break
My problem is, the elements in the combinations that end up being selected are concentrated at a locality. I want them to be more uniformly spread out (may be some distribution?). Some heads up will be helpful.
UPDATE:
Even without the probabilistic selection I need only a fraction of the combinations. That simply picks up the first N elements if randomised selection is not used. Code updated accordingly.
For clarity, for my specific case, I can't do the following:
# Cannot do the following, memory (RAM) usage blows up
all_combi = list(itertools.combinations(myList, r=2))
random.shuffle(all_combi)
selected_combi = random.sample(all_combi, N)
Aucun commentaire:
Enregistrer un commentaire