mardi 2 juin 2015

Python itertools create iterator of random subset

I have an iterator itertools.combinations(big_matrix,50) with big_matrix.shape = (65,x), so there are about 10^14 combinations. I want to get a random subset of say 10000 of this combinations, also as an iterator, to save memory.

I tried the itertools recipe

def random_combination(iterable, r):
  "Random selection from itertools.combinations(iterable, r)"
  pool = tuple(iterable)
  n = len(pool)
  indices = sorted(random.sample(xrange(n), r))
  return tuple(pool[i] for i in indices)

but tuple(iterable) creates a tuple of the 10^14 values, and the function does not return an iterator but an array.

random.sample does not work, because it is unable to get the number of elements in the itertools.combinations object.

Is there any way to do this?




Aucun commentaire:

Enregistrer un commentaire