dimanche 27 décembre 2015

Be fast to create a big numpy array of random boolean

I need to create a big random boolean numpy array without going on swap. I use a laptop with 8Gb.

Create a 1200x2e6 array is less than 2s and use 2.29 Go of RAM:

>>> dd = np.ones((1200, int(2e6)), dtype=bool)
>>> dd.nbytes/1024./1024
2288.818359375

>>> dd.shape
(1200, 2000000)

For a small 1200x400e3 array using np.random.randint it's fast, 5sec the db has 458Mb:

db = np.array(np.random.randint(2, size=(int(400e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'

But only twice bigger db 1200x800e3 array it go on swap and take 2.7 min ;(

cmd = """
import numpy as np
db = np.array(np.random.randint(2, size=(int(800e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'"""

print timeit.Timer(cmd).timeit(1)

Using getrandbits take even longer, 8min and went on swap too:

from random import getrandbits
db = np.array([not getrandbits(1) for x in xrange(int(1200*800e3))], dtype=bool)

Using np.random.randint for a 1200 x 2e6 array we get a MemoryError

So what is the solution to create a random boolean 1200x2e6 array in a fast way ?




Aucun commentaire:

Enregistrer un commentaire