The Metropolis-Hastings (MH) algorithm is a sampling algorithm that generates samples from a target distribution
. MH is most effective in the case
where is intractable. Let’s express as the following fraction,
where is some tractable density function and is an intractable normalizing constant. MH enables us to
sample from without knowing anything about the intractable normalizing constant !
The idea behind MH is to construct a Markov Chain whose stationary distribution is . How do we do this? From Markov Chain theory we know that
if a Markov Chain is reversible, the detailed balance equation holds,
where is the stationary distribution and is the transition kernel.
Hence, if we can find a transition kernel which satisfies this reversibility
condition, we can sample from the target distribution .
We start by defining some tractable proposal distribution which generates a proposal given some
current state (this is something you pick). Since is a transition kernel, does it satisfy the reversibility condition? Let’s
derive a general solution. Assume does not satisfy the reversibility condition (if it does, then we just have the Metropolis algorithm). We have the
following,
An intuitive interpretation of what this equation is saying is basically we are
moving from state to state more than we are vice versa; in short, detailed
balance is not satisfied. To correct this, we introduce a probability that
the move is actually made,
Rearranging we obtain,
Reiterating, we interpret as the probability of moving from state to
state Since it is a probability, it cannot exceed 1 so the final definition is,
Note that is the transition kernel that we were after! It satisfies
the detailed balance equation by definition; however, does it deal with the pesky
intractable normalizing constant Yes! We have that,
So we no longer have to worry about (yay!). Ergo, we can simply use
as our transition kernel for our Markov Chain and rest assured that as long as
we obtain enough samples, they will be distributed according to the stationary
distribution; voila, we have samples from the intractable distribution !