jeudi 10 décembre 2020

Yield a random permutation from a list of generators of known length

I have a sequence of generators which yield objects that require a reasonable amount of memory (they are ipaddress.IPv4Network instances and yielding from them yields a whole ipaddress.IPv4Address instance).

gens = [a, b, c, ...]

Each generator has a deterministic number of elements it will yield, e.g.:

gen_lens = [17000000, 1024, 8192, ...]

I would like to take batches of yielded values, of n length, in random order. Each item from any of the generators must only be selected once.

My current idea is to get the total number of possible elements that can be yielded (equal to the maximum array index - 1), then iterate through this list in random order using something like the Fisher-Yates-Knuth algorithm, yielding the item of the given random index:

random_indexes = random.shuffle(range(0, sum(gen_lens)))
for i in random_indexes:
    # some windowing logic here to check which generator we should get from and set index appropriately, x = generator index, y = i - sum(gen_lens[0:x])
    yield gens[x][y]

So the end result is, I have a new generator which will yield a random permutation of all elements from my input generators, without having to store all the results of what my sub-generators are yielding.

It still requires to build a list of indexes, which is quite expensive when you have millions of indexes. Is there a way around that? Can anyone suggest a better approach?




Aucun commentaire:

Enregistrer un commentaire