jeudi 20 juillet 2017

Avoid generating duplicate values from random

I want to generate random numbers and store them in a list as the following:

alist = [random.randint(0, 2 ** mypower - 1) for _ in range(total)]

My concern is the following: I want to generate total=40 million values in the range of (0, 2 ** mypower - 1). If mypower = 64, then alist will be of size ~20GB (40M*68*8) which is very large for my laptop memory. I have an idea to iteratively generate chunk of values, say 5 million at a time, and save them to a file so that I don't have to generate all 40M values at once. My concern is that if I do that in a loop, it is guaranteed that random.randint(0, 2 ** mypower - 1) will not generate values that were already generated from the previous iteration? Something like this:

        for i in range(num_of_chunks):
            alist = [random.randint(0, 2 ** mypower - 1) for _ in range(chunk)]
            # save to file




Aucun commentaire:

Enregistrer un commentaire