If I evaluate something like:
numpy.random.choice(2, size=100000, p=[0.01, 0.99])
using one uniformly-distributed random float
, say r
, and deciding if r < 0.01
will presumably waste many of the random bits (entropy) generated. I've heard (second-hand) that generating psuedo-random numbers is computationally expensive, so I assumed that numpy
would not be doing that, and rather would use a scheme like arithmetic coding in this case.
However, at first glance it appears that choice
does indeed generate a float
for every sample it is asked for. Further, a quick timeit
experiment shows that generating n
uniform floats is actually quicker than n
samples from p=[0.01, 0.99]
.
>>> timeit.timeit(lambda : numpy.random.choice(2, size=100000, p=[0.01, 0.99]), number=1000)
1.74494537999999
>>> timeit.timeit(lambda : numpy.random.random(size=100000), number=1000)
0.8165735180009506
Does choice
really generate a float
for each sample, as it would appear? Would it not significantly improve performance to use some compression algorithm in some cases (particularly if size
is large and p
is distributed unevenly)? If not, why not?
Aucun commentaire:
Enregistrer un commentaire