mardi 5 mai 2015

Python random sample generator (comfortable with huge population size)

As you know random.sample(population,sample_size) quickly returns a random sample, but what if you don't know in advance the size of the sample? You end up in sampling the entire population, or shuffling it, which is the same. But this can be wasteful (if the majority of sample sizes come up to be small compared to population size) or even unfeasible (if population size is huge, running out of memory). Also, what if your code needs to jump from here to there before picking the next element of the sample?

In all these cases what you'd really need is a random sample generator that, when requested, yields a random element until all the elements of population have been picked.

If you have a solution please let everybody know. I'll post my solution below, hoping it can be useful to somebody.




Aucun commentaire:

Enregistrer un commentaire