lundi 14 décembre 2020

How to retrieve random elements from a Python 3 generator?

Lets say I have a very large list. I am generating combinations for this list using itertools. What I want to achieve is to select the elements at random, without materializing the generator (it is too big even for 16 GB of RAM, I learnt that the hard way).

Here, is what I have done so far:

import itertools
import random


myList = ['abc', 'bcd', 'cdef', 'adcv', 'zofd', 'qmkdf', 'qmk', 'oswd']
# much larger than above in my case

myCombi_gen = itertools.combinations(myList, r=2)
# Above should generate: [('abc', 'bcd'), ('abc', 'cdef'), .....]

N = 15
counter = 0

for elem in myCombi_gen:
    if random.random() > 0.5:       # selecting based on probability
        print(elem, end=' ')
        counter+=1
        # doing something useful here
        if counter == N:
            break

My problem is, the elements in the combinations that end up being selected are concentrated at a locality. I want them to be more uniformly spread out (may be some distribution?). Some heads up will be helpful.

UPDATE:

Even without the probabilistic selection I need only a fraction of the combinations. That simply picks up the first N elements if randomised selection is not used. Code updated accordingly.

For clarity, for my specific case, I can't do the following:

# Cannot do the following, memory (RAM) usage blows up

all_combi = list(itertools.combinations(myList, r=2))
random.shuffle(all_combi)
selected_combi = random.sample(all_combi, N)




Aucun commentaire:

Enregistrer un commentaire