dimanche 24 février 2019

why is my python implementation of metropolis algorithm (mcmc) so slow?

I'm trying to implement the Metropolis algorithm (a simpler version of the Metropolis-Hastings algorithm) in Python.

Here is my implementation:

def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
"""
Metropolis Algorithm using a Gaussian proposal distribution.
p: distribution that we want to sample from (can be unnormalized)
z0: Initial sample
sigma: standard deviation of the proposal normal distribution.
n_samples: number of final samples that we want to obtain.
burn_in: number of initial samples to discard.
m: this number is used to take every mth sample at the end
"""
# List of samples, check feasibility of first sample and set z to first sample
sample_list = [z0]
_ = p(z0) 
z = z0
# set a counter of samples for burn-in
n_sampled = 0

while len(sample_list[::m]) < n_samples:
    # Sample a candidate from Normal(mu, sigma),  draw a uniform sample, find acceptance probability
    cand = np.random.normal(loc=z, scale=sigma)
    u = np.random.rand()
    try:
        prob = min(1, p(cand) / p(z))
    except (OverflowError, ValueError) as error:
        continue
    n_sampled += 1

    if prob > u:
        z = cand  # accept and make candidate the new sample

    # do not add burn-in samples
    if n_sampled > burn_in:
        sample_list.append(z)

# Finally want to take every Mth sample in order to achieve independence
return np.array(sample_list)[::m]

When I try to apply my algorithm to an exponential function it takes very little time. However, when I try it on a t-distribution it takes ages, considering that it's not doing that many calculations. This is how you can replicate my code:

t_samples = Metropolis_Gaussian(pdf_t, 3, 1, 1000, 1000, m=100)
plt.hist(t_samples, density=True, bins=15, label='histogram of samples')
x = np.linspace(min(t_samples), max(t_samples), 100)
plt.plot(x, pdf_t(x), label='t pdf')
plt.xlim(min(t_samples), max(t_samples))
plt.title("Sampling t distribution via Metropolis")
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend()

This code takes quite a long time to run and I'm not sure why. In my code for Metropolis_Gaussian, I am trying to improve efficiency by

  1. Not adding to the list repeated samples
  2. Not recording burn-in samples

The function pdf_t is defined as follows

from scipy.stats import t
def pdf_t(x, df=10):
    return t.pdf(x, df=df)




Aucun commentaire:

Enregistrer un commentaire