jeudi 22 juin 2017

Use frozen rv distribution objects to speed up a program with many function calls in which the same distribution is used?

Problem

Assume there is a program that needs to generate a lot of random variates from the same distribution in various functions and classes at various times. It seems that using a frozen rv distribution object (and drag that object through all functions) is a lot faster than re-generating the distribution in every function before drawing random variates.

To give some evidence: Consider that this code:

import scipy.stats
import time

def gen_rvs(dist):
    dist.rvs(1000)

time1 = time.time()
d = scipy.stats.bernoulli(0.75)
for i in range(100000):
    gen_rvs(d)
print(time.time() - time1)

runs (on my machine) almost 10 times faster than this (7.4 sec vs. 68.6 sec):

import scipy.stats
import time

def gen_rvs():
    scipy.stats.bernoulli(0.75).rvs(1000)

time1 = time.time()
for i in range(100000):
    gen_rvs()
print(time.time() - time1)

Potential Solutions

  • Dragging distribution objects around through all functions etc.? Problem: This seems very messy, will require more arguments (if you perhaps need multiple distributions), makes function calls less easy to understand and will make errors more likely.
  • Having the frozen rv distribution object as a global variable? Problem: Will not work as easily if the program is spread out across multiple files. Parallelization would create more problems.
  • Pass the frozen rv distribution to all classes that need the rv generator at some point and save it locally everywhere? Problem: Still seems messy.
  • Pass generated random variates instead of the distribution to functions? Problem: Even more messy if there are longer call stacks. And it would have to be known in advance how many random variates are needed in a particular function.
  • Run the random number generator in a separate process and push random vartates into a queue from where they are collected by other processes? Problem: While this sounds fancy and not messy, implementing it and governing efficiently how many random variates the process should generate might become messy.

What is the preferred way to deal with that?




Aucun commentaire:

Enregistrer un commentaire