# The Metropolis-Hastings Algorithm¶

**Markov chain.**Considering a -algebra on , a Markov chain is a process such that

An example is the *random walk* for which
where the steps
are independent and identically distributed.

**Transition kernel.**A transition kernel on is a mapping such that

is measurable;

- is a probability distribution on .

The kernel has density if .

**Some Notations.**Let be a homogeneous Markov Chain of transition on with initial distribution (that is ):

denotes the probability distribution of the Markov Chain ;

denotes the probability distribution of ();

- denotes the mapping defined by for all .

**Total variation convergence.**A Markov Chain of distribution is said to converge in total variation distance towards the distribution if

Then the notation used here is .

**Some interesting properties.**Let be a (target) distribution on , then a transition kernel is said to be:

-invariant if ;

-irreducible if, such that , holds.

Markov Chain Monte-Carlo techniques allows to sample and integrate according to a distribution which is only known up to a multiplicative constant. This situation is common in Bayesian statistics where the “target” distribution, the posterior one , is proportional to the product of prior and likelihood: see equation (1).

In particular, given a “target” distribution and a -irreducible kernel transition , the Metropolis-Hastings algorithm produces a Markov chain of distribution with the following properties:

the transition kernel of the Markov chain is -invariant;

;

the Markov chain satisfies the

*ergodic theorem*: let be a real-valued function such that , then, whatever the initial distribution is:

In that sense, simulating amounts to sampling according to and can be used to integrate relatively to the probability measure . Let us remark that the ergodic theorem implies that almost surely.

By abusing the notation, represents, in the remainder of this section, a function of which is proportional to the PDF of the target distribution . Given a transition kernel of density , the scheme of the Metropolis-Hastings algorithm is the following (lower case letters are used hereafter for both random variables and realizations as usual in the Bayesian literature):

Draw and set .

Draw a candidate for according to the given transition kernel : .

Compute the ratio .

Draw ; if then set , otherwise set .

Set and go back to 1).

Of course, if is replaced by a different function of which is proportional to it, the algorithm keeps unchanged, since only takes part in the latter in the ratio . Moreover, if proposes some candidates in a uniform manner (constant density ), the candidate is accepted according to a ratio which reduces to the previous “natural” ratio of PDF. The introduction of in the ratio prevents from the bias of a non-uniform proposition of candidates which would favor some areas of .

The -invariance is ensured by the symmetry of the expression of (-reversibility).

In practice, is specified as a random walk ( such that ) or as a independent sampling ( such that ), or as a mixture of random walk and independent sampling.

API:

Examples:

References:

Robert, C.P. and Casella, G. (2004).

*Monte Carlo Statistical Methods*(Second Edition), Springer.Meyn, S. and Tweedie R.L. (2009).

*Markov Chains and Stochastic Stability*(Second Edition), Cambridge University Press.