I have a long python generator that I want to "thin out" by randomly selecting a subset of values. Unfortunately, random.sample()
will not work with arbitrary sequences. Apparently, it needs something that supports the len()
operation (and perhaps non-sequential access to the sequence, but that's not clear). And I don't want to build an enormous list just so I can thin it out.
As a matter of fact, it is possible to sample from a sequence uniformly without knowing its length-- there's a nice algorithm in Programming perl
that does just that. But does anyone know of a standard python module that provides this functionality?
Demo of the problem (Python 3)
>>> import itertools, random
>>> random.sample(iter("abcd"), 2)
...
TypeError: Population must be a sequence or set. For dicts, use list(d).
On Python 2, the error is more transparent:
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
random.sample(iter("abcd"), 2)
File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py", line 321, in sample
n = len(population)
TypeError: object of type 'iterator' has no len()
If there's no alternative to random.sample()
, I'd try my luck with wrapping the generator into an object that provides a __len__
method (I can find out the length in advance). So I'll accept an answer that shows how to do that cleanly.
Aucun commentaire:
Enregistrer un commentaire