jeudi 30 juillet 2020

Why does numpy.random.choice not use arithmetic coding?

If I evaluate something like:

numpy.random.choice(2, size=100000, p=[0.01, 0.99])

using one uniformly-distributed random float, say r, and deciding if r < 0.01 will presumably waste many of the random bits (entropy) generated. I've heard (second-hand) that generating psuedo-random numbers is computationally expensive, so I assumed that numpy would not be doing that, and rather would use a scheme like arithmetic coding in this case.

However, at first glance it appears that choice does indeed generate a float for every sample it is asked for. Further, a quick timeit experiment shows that generating n uniform floats is actually quicker than n samples from p=[0.01, 0.99].

>>> timeit.timeit(lambda : numpy.random.choice(2, size=100000, p=[0.01, 0.99]), number=1000)
1.74494537999999
>>> timeit.timeit(lambda : numpy.random.random(size=100000), number=1000)
0.8165735180009506

Does choice really generate a float for each sample, as it would appear? Would it not significantly improve performance to use some compression algorithm in some cases (particularly if size is large and p is distributed unevenly)? If not, why not?




Aucun commentaire:

Enregistrer un commentaire