The Metropolis-Hastings AlgorithmΒΆ
Markov chain. Considering a -algebra
on
, a Markov chain is a process
such that
An example is the random walk for which
where the steps
are independent and identically distributed.
For all
is measurable;
For all
is a probability distribution on
.
The kernel has density
if
.
denotes the probability distribution of the Markov Chain
;
denotes the probability distribution of
(
);
denotes the mapping defined by
for all
.
Then the notation used here is .
-invariant if
;
-irreducible if,
such that
,
holds.
Markov Chain Monte-Carlo techniques allows one to sample and integrate
according to a distribution which is only known up to a
multiplicative constant. This situation is common in Bayesian statistics
where the βtargetβ distribution, the posterior one
, is proportional
to the product of prior and likelihood: see equation (1).
In particular, given a βtargetβ distribution and a
-irreducible kernel transition
, the
Metropolis-Hastings algorithm produces a Markov chain
of distribution
with the
following properties:
the transition kernel of the Markov chain is
-invariant;
;
the Markov chain satisfies the ergodic theorem: let
be a real-valued function such that
, then, whatever the initial distribution
is:
In that sense, simulating amounts to
sampling according to
and can be used to integrate relatively
to the probability measure
. Let us remark that the ergodic
theorem implies that
almost surely.
By abusing the notation, represents, in the remainder of
this section, a function of
which is proportional to the PDF
of the target distribution
. Given a transition kernel
of density
, the scheme of the Metropolis-Hastings
algorithm is the following (lower case letters are used hereafter for
both random variables and realizations as usual in the Bayesian
literature):
Draw
and set
.
Draw a candidate for
according to the given transition kernel
:
.
Compute the ratio
.
Draw
; if
then set
, otherwise set
.
Set
and go back to 1).
Of course, if is replaced by a different function of
which is proportional to it, the algorithm keeps unchanged, since
only takes part in the latter in the ratio
. Moreover, if
proposes
some candidates in a uniform manner (constant density
), the
candidate
is accepted according to a ratio
which reduces to the previous βnaturalβ ratio
of PDF. The introduction of
in the ratio
prevents from the bias of a
non-uniform proposition of candidates which would favor some areas of
.
The -invariance is ensured by the symmetry of the expression of
(
-reversibility).
In practice, is specified as a random walk
(
such that
) or as a
independent sampling (
such that
), or as a mixture of random walk and
independent sampling.