I need to create a big random boolean numpy array without going on swap. I use a laptop with 8Gb.
Create a 1200x2e6 array is less than 2s and use 2.29 Go of RAM:
>>> dd = np.ones((1200, int(2e6)), dtype=bool)
>>> dd.nbytes/1024./1024
2288.818359375
>>> dd.shape
(1200, 2000000)
For a small 1200x400e3 array using np.random.randint it's fast, 5sec the db has 458Mb:
db = np.array(np.random.randint(2, size=(int(400e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'
But only twice bigger db 1200x800e3 array it go on swap and take 2.7 min ;(
cmd = """
import numpy as np
db = np.array(np.random.randint(2, size=(int(800e3), 1200)), dtype=bool)
print db.nbytes/1024./1024., 'Mb'"""
print timeit.Timer(cmd).timeit(1)
Using getrandbits take even longer, 8min and went on swap too:
from random import getrandbits
db = np.array([not getrandbits(1) for x in xrange(int(1200*800e3))], dtype=bool)
Using np.random.randint for a 1200 x 2e6 array we get a MemoryError
So what is the solution to create a random boolean 1200x2e6 array in a fast way ?
Aucun commentaire:
Enregistrer un commentaire