mardi 19 février 2019

Faster random sampling in python?

I have some code that samples a posterior distribution using MCMC, specifically Metropolis Hastings. I use scipy to generate random samples:

import numpy as np
from scipy import stats

def get_samples(n):
    """
    Generate and return a ratio of random samples from beta and gamma distributions.

    For simplicity, Prior is fixed as Beta(a=2,b=5), Likelihood is fixed as Normal(0,2)

    :type n: int
    :param n: number of iterations

    :rtype: numpy.ndarray
    """
    x_t = stats.uniform(0,1).rvs() # initial value
    posterior = np.zeros((n,))
    for t in range(n):
        x_prime = stats.norm(loc=x_t).rvs() # candidate
        p1 = stats.beta(a=2,b=5).pdf(x_prime)*stats.norm(loc=0,scale=2).pdf(x_prime) # prior * likelihood 
        p2 = stats.beta(a=2,b=5).pdf(x_t)*stats.norm(loc=0,scale=2).pdf(x_t) # prior * likelihood 
        alpha = p1/p2 # ratio
        u = stats.uniform(0,1).rvs() # random uniform
        if u <= alpha:
            x_t = x_prime # accept
            posterior[t] = x_t
        elif u > alpha:
            x_t = x_t # reject
    posterior = posterior[np.where(posterior > 0)] # get rid of initial zeros that don't contribute to distribution
    return posterior

Generally, I try to avoid using explicit for loops in python - I would try to generate everything using pure numpy. For the case of this algorithm however, a for loop with if statements is unavoidable. Therefore, the code is quite slow. When I profile my code, it is spending most time within the for loop (obviously), and more specifically, the slowest parts are in generating the random numbers; stats.beta().pdf() and stats.norm().pdf().

Sometimes I use numba to speed up my code for matrix operations. While numba is compatible with some numpy operations, generating random numbers is not one of them. Numba has a cuda rng, but this is limited to normal and uniform distributions.

My question is, is there a way to significantly speed up the code above using some sort of random sampling of various distributions compatible with numba?

We don't have to limit ourselves to numba, but it is the only easy to use optimizer that I know of. More generally, I am looking for ways to speed up random sampling for various distributions (beta, gamma, poisson) within a for loop in python.




Aucun commentaire:

Enregistrer un commentaire