I have multiple iterables and I need to create the Cartesian product of those iterables and then randomly sample from the resulting pool of tuples. The problem is that the total number of combinations of these iterables is somewhere around 1e19, so I can't possibly load all of this into memory.
What I thought was using itertools.product
in combination with a random number generator to skip random number of items, then once I get to the randomly selected item, I perform my calculations and continue until I run out of the generator. The plan was to do something like:
from itertools import product
from random import randint
iterables = () # tuple of 18 iterables
versions = product(iterables)
def do_stuff():
# do stuff
STEP_SIZE = int(1e6)
# start both counts from 0.
# First value to be taken is start + step
# after that increment start to be equal to count and repeat
start = 0
count = 0
while True:
try:
step = randint(1, 100) * STEP_SIZE
for v in versions:
# if the count is less than required skip values while incrementing count
if count < start + step:
versions.next()
count += 1
else:
do_stuff(*v)
start = count
except StopIteration:
break
Unfortunately, itertools.product
objects don't have the next()
method, so this doesn't work. What other way is there to go through this large number of tuples and either take a random sample or directly run calculations on the values?
Aucun commentaire:
Enregistrer un commentaire