Metropolis Hastings

1 minute read

The Metropolis-Hastings (MH) algorithm is a sampling algorithm that generates samples from a target distribution $\pi(\cdot)$. MH is most effective in the case where $\pi(\cdot)$ is intractable. Let’s express $\pi(\cdot)$ as the following fraction,

$$ \pi(\cdot) = \frac{f(\cdot)}{K} $$

where $f(·)$ is some tractable density function and $K$ is an intractable normalizing constant. MH enables us to sample from $\pi(\cdot)$ without knowing anything about the intractable normalizing constant $K$!

The idea behind MH is to construct a Markov Chain whose stationary distribution is $\pi(\cdot)$. How do we do this? From Markov Chain theory we know that if a Markov Chain is reversible, the detailed balance equation holds,

$$\pi(x)p(x,y) = \pi(y)p(y,x)$$

where $\pi(\cdot)$ is the stationary distribution and $p(x,y)$ is the transition kernel. Hence, if we can find a transition kernel $p(x,y)$ which satisfies this reversibility condition, we can sample from the target distribution $\pi(\cdot)$.

We start by defining some tractable proposal distribution $q(x,y),$ which generates a proposal $y$ given some current state $x$ (this is something you pick). Since $q(x,y)$ is a transition kernel, does it satisfy the reversibility condition? Let’s derive a general solution. Assume $q(x,y)$ does not satisfy the reversibility condition (if it does, then we just have the Metropolis algorithm). We have the following,

$$\pi(x)q(x,y) > \pi(y)q(y,x) \hspace{1cm} (w.l.g)$$

An intuitive interpretation of what this equation is saying is basically we are moving from state $x$ to state $y$ more than we are vice versa; in short, detailed balance is not satisfied. To correct this, we introduce a probability $\alpha(x,y)$ that the move is actually made,

$$\pi(x)q(x,y){\color{ocre}\alpha(x,y)} = \pi(y)q(y,x)$$

Rearranging we obtain,

$$\alpha(x,y) = \frac{\pi(y)q(y,x)}{\pi(x)q(x,y)}$$

Reiterating, we interpret $\alpha(x,y)$ as the probability of moving from state $x$ to state $y.$ Since it is a probability, it cannot exceed 1 so the final definition is,

$$\alpha(x,y) = \min [\frac{\pi(y)q(y,x)}{\pi(x)q(x,y)}, 1]$$

Note that $\alpha(x,y)$ is the transition kernel $p(x,y)$ that we were after! It satisfies the detailed balance equation by definition; however, does it deal with the pesky intractable normalizing constant $K?$ Yes! We have that,

$$ \begin{align*} \frac{\pi(y)q(y,x)}{\pi(x)q(x,y)} &= \frac{\frac{f(y)q(y,x)}{K}}{\frac{f(x)q(x,y)}{K}}\\ &= \frac{f(y)q(y,x)}{f(x)q(x,y)} \end{align*} $$

So we no longer have to worry about $K$ (yay!). Ergo, we can simply use $\alpha(x,y)$ as our transition kernel for our Markov Chain and rest assured that as long as we obtain enough samples, they will be distributed according to the stationary distribution; voila, we have samples from the intractable distribution $\pi(\cdot)$!

Twitter Facebook LinkedIn

Metropolis Hastings

You May Also Enjoy

word2vec

The Gospel according to Matthew

Rejection Sampling

Importance Sampling