I'm trying to implement the Metropolis algorithm (a simpler version of the Metropolis-Hastings algorithm) in Python.
Here is my implementation:
def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
"""
Metropolis Algorithm using a Gaussian proposal distribution.
p: distribution that we want to sample from (can be unnormalized)
z0: Initial sample
sigma: standard deviation of the proposal normal distribution.
n_samples: number of final samples that we want to obtain.
burn_in: number of initial samples to discard.
m: this number is used to take every mth sample at the end
"""
# List of samples, check feasibility of first sample and set z to first sample
sample_list = [z0]
_ = p(z0)
z = z0
# set a counter of samples for burn-in
n_sampled = 0
while len(sample_list[::m]) < n_samples:
# Sample a candidate from Normal(mu, sigma), draw a uniform sample, find acceptance probability
cand = np.random.normal(loc=z, scale=sigma)
u = np.random.rand()
try:
prob = min(1, p(cand) / p(z))
except (OverflowError, ValueError) as error:
continue
n_sampled += 1
if prob > u:
z = cand # accept and make candidate the new sample
# do not add burn-in samples
if n_sampled > burn_in:
sample_list.append(z)
# Finally want to take every Mth sample in order to achieve independence
return np.array(sample_list)[::m]
When I try to apply my algorithm to an exponential function it takes very little time. However, when I try it on a t-distribution it takes ages, considering that it's not doing that many calculations. This is how you can replicate my code:
t_samples = Metropolis_Gaussian(pdf_t, 3, 1, 1000, 1000, m=100)
plt.hist(t_samples, density=True, bins=15, label='histogram of samples')
x = np.linspace(min(t_samples), max(t_samples), 100)
plt.plot(x, pdf_t(x), label='t pdf')
plt.xlim(min(t_samples), max(t_samples))
plt.title("Sampling t distribution via Metropolis")
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend()
This code takes quite a long time to run and I'm not sure why. In my code for Metropolis_Gaussian, I am trying to improve efficiency by
- Not adding to the list repeated samples
- Not recording burn-in samples
The function pdf_t is defined as follows
from scipy.stats import t
def pdf_t(x, df=10):
return t.pdf(x, df=df)
Aucun commentaire:
Enregistrer un commentaire