Markov chain Monte Carlo sampling algorithms

Lecture 4

Dr Samuel Livingstone

Last time

Outline for today - Theory of MCMC

Part 1: Introduction to MCMC theory

How to assess MCMC algorithms

Statistical properties of the estimator \(\tilde{f}_N\)

Part 2: Assessing bias

The bias of an MCMC estimator

Quantifying bias

Convergence rates in MCMC

Proposition

Proof

Why is this useful to us?

An example of a geometrically ergodic Markov chain

AR(1) simulation \(\beta = 0.7\) (regardless of starting point, the chain reaches the typical set quickly)

An example that is not geometrically ergodic

20 simulations of Random Walk, \(X_1 = 0\), \(h = 1\), \(N = 50\)

What about the Random Walk Metropolis?

Lack of GE and practical performance (e.g. traceplots)

Example of sticking chain (Not typical of RWM, but other MH algorithms)

RWM, Cauchy target

Establishing geometric ergodicity

Example: the AR(1) process

A functional analytic perspective (1)

A functional analytic perspective (2)

Burn in: a more practical approach to the bias problem

Burn in: the main idea

Burn in in general

Part 3: Assessing Variance

Assessing variance

Returning to our intuition of things not working well in the RWM

The asymptotic variance

A Markov chain CLT

Peskun–Tierney orderings

Peskun’s theorem in MCMC

The MCMC effective sample size

The effective sample size (in theory)

\[ N_{\text{eff}} := \frac{N}{1+2\sum_{k=2}^\infty \text{Corr}(f(X_1),f(X_k))}. \]

(This is how many independent samples from \(\pi\) you would need to get an estimator with the same variance as \(\tilde{f}_N\)).

Auto-correlation plots are also helpful to compare algorithms/visualise variance

Part 4: Complexity

Computational complexity in MCMC

The cost per iteration of a Metropolis–Hastings algorithm

Convergence rates/mixing times with dimension

End of lecture

Tomorrow we discuss more advanced algorithms!