vendredi 12 juin 2020

Gibbs sampler fails to converge

I've been trying to understand Gibbs sampling for some time. Recently, I saw a video that made a good deal of sense.

https://www.youtube.com/watch?v=a_08GKWHFWo

The author used Gibbs sampling to converge on the mean values (theta_1 and theta_2) of a bivariate normal distribution, using the process as follows:

init: Initialize theta_2 to a random value.

Loop:

  1. sample theta_1 conditioned on theta_2 as N~(p(theta_2), [1-p**2])
  2. sample theta_2 conditioned on theta_1 as N~(p(theta_1), [1-p**2])

(repeat until convergence.)

I tried this on my own and ran into an issue:

import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

rv = multivariate_normal(mean=[0.5, -0.2], cov=[[1, 0.9], [0.9, 1]])

rv.mean
>>> 
array([ 0.5, -0.2])

rv.cov
>>>
array([[1. , 0.9],
       [0.9, 1. ]])

import numpy as np
samples = []

curr_t2 = np.random.rand()
def gibbs(iterations=5000):
    theta_1 = np.random.normal(curr_t2, (1-0.9**2), None)
    theta_2 = np.random.normal(theta_1, (1-0.9**2), None)
    samples.append((theta_1,theta_2))
    for i in range(iterations-1):
        theta_1 = np.random.normal(theta_2, (1-0.9**2), None)
        theta_2 = np.random.normal(theta_1, (1-0.9**2), None)
        samples.append((theta_1,theta_2))
gibbs()

sum([a for a,b in samples])/len(samples)
>>>
4.745736136676516

sum([b for a,b in samples])/len(samples)
>>>
4.746816908769834

Now, I see where I messed up. I found theta_1 conditioned on theta_2's actual value, not its probability. Likewise, I found theta_2 conditioned on theta_1's actual value, not its probability.

Where I'm stuck is, how do I evaluate the probability of either theta taking on any given observed value?

Two options I see: probability density (based on location on normal curve) AND p-value (integration from infinity (and/or negative infinity) to the observed value). Neither of these solutions sound "right."

How should I proceed?




Aucun commentaire:

Enregistrer un commentaire