jeudi 1 décembre 2016

Why is random.sample faster than numpy's random.choice?

I need a way to sample without replacement a certain array a. I tried two approaches (see MCVE below), using random.sample() and np.random.choice.

I assumed the numpy function would be faster, but it turns out it is not. In my tests random.sample is ~15% faster than np.random.choice.

Is this correct, or am I doing something wrong in my example below? If this is correct, why?

import numpy as np
import random
import time
from contextlib import contextmanager

def timeblock(label):
    start = time.clock()
        end = time.clock()
        print ('{} elapsed: {}'.format(label, end - start))

def f1(a, n_sample):
    return random.sample(range(len(a)), n_sample)

def f2(a, n_sample):
    return np.random.choice(len(a), n_sample, replace=False)

# Generate random array
a = np.random.uniform(1., 100., 10000)
# Number of samples' indexes to randomly take from a
n_sample = 100
# Number of times to repeat functions f1 and f2
N = 100000

with timeblock("random.sample"):
    for _ in range(N):
        f1(a, n_sample)

with timeblock("np.random.choice"):
    for _ in range(N):
        f2(a, n_sample)

Aucun commentaire:

Enregistrer un commentaire