dimanche 24 janvier 2016

numpy's random vs python's default random subsampling

I observed that python's default random.sample is much faster than numpy's random.choice. Taking a small sample from an array of length 1 million, random.sample is more than 1000x faster than its numpy's counterpart.

In [1]: import numpy as np

In [2]: import random

In [3]: arr = [x for x in range(1000000)]

In [4]: nparr = np.array(arr)

In [5]: %timeit random.sample(arr, 5)
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.54 µs per loop

In [6]: %timeit np.random.choice(arr, 5)
10 loops, best of 3: 47.7 ms per loop

In [7]: %timeit np.random.choice(nparr, 5)
The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 7.79 µs per loop

Although numpy sampling from numpy array was decently fast yet it was slower than default random sampling.

Is the observation above correct, or am I missing the difference between what random.sample and np.random.choice compute?




Aucun commentaire:

Enregistrer un commentaire