Markov chain Monte Carlo sampling algorithms

Lecture 3

Dr Samuel Livingstone

Last time

Outline

Part 1: Recap

Markov chains on general state spaces

Limiting distributions

Recap of \(\pi\)-invariance

Part 2: MCMC & the (random walk) Metropolis algorithm

Basic idea of MCMC

The Metropolis–Hastings algorithm: background

The original Metropolis, Rosenbluth, Rosenbluth, Teller & Teller (1953) paper

Idea of the (random walk) Metropolis algorithm

Random Walk Metropolis algorithm

A detour: the Random Walk

Some discussion…

Stepping through an example

Show demo code

Sketch proof of \(\pi\)-invariance

RWM example, \(\pi\) is \(N(0,1)\), \(X_0 = 0\)

Constructing an ergodic average

For example:

sum( (-2 < mc$x_store) * (mc$x_store < 2) * 1 ) / nits
## [1] 0.9475

is an approximation for the integral: \[ \int \mathbb{I}_{(-2,2)}(x) \frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx. \]

To approximate \(\mathbb{E}_\pi[X]\) we can compute

mean(mc$x_store)
## [1] 0.009321799

Running average plot for the mean estimate

The (average) acceptance rate

cat("The acceptance rate is: ",mc$a_rate)
## The acceptance rate is:  0.4705

What happens when the step-size is too large?

Choosing \(h\) too large example

## The acceptance rate is:  0.043

What about when it’s too small?

## The acceptance rate is:  0.9835

The Goldilocks principle

Quick questions

Part 3: Metropolis–Hastings

Metropolis-Hastings algorithm: general case (assuming \(\pi\) has a density)

For \(i \in \{1,...,n-1\}\)

Some discussion…

Theorem: Metropolis–Hastings works!

The following hold for a Markov chain produced using the Metropolis-Hastings algorithm:

  1. The chain is \(\pi\)-invariant.
  2. \(\pi\) is the unique limiting distribution if \(\infty > \pi(x) > 0\) and \(\infty > q(x,y) > 0\) for all \(x,y \in E\)

Proof sketch, part 1

Proof of part 2

Towards a more general algorithm & proof

The Metropolis–Hastings kernel \(P\)

Part 4: The Gibbs sampler

The Gibbs sampler

Gibbs sampler Example 1

The Gibbs sampler code

for (i in 1:(nits-1)) {
    x[1,i+1] <- rnorm(1, mean = rho * x[2,i], sd = sqrt(1-rho^2) )
    x[2,i+1] <- rnorm(1, mean = rho * x[1,i+1], sd = sqrt(1-rho^2) )
  }
}

When \(\rho = 0\) the sampler is very good!

When \(\rho = 0.99\) the sampler is not very good!

Gibbs sampler Example 2

Gibbs sampler for example 2

Remarks on the Gibbs sampler

Part 4: A case study

A case study: Poisson regression

\[ \begin{aligned} Y_i &\sim \text{Poisson}(e^{\sum_{j=1}^d \beta_jx_{ij}}), \\ \beta_j &\stackrel{iid}{\sim} N(0, 1^2). \end{aligned} \]

There are many types of data and phenomena for which this kind of probabilistic model will be appropriate, so feel free to impose that of most interest to you onto the problem!

Remarks on the model

Step 1: simulate some data (so we know the truth)

The (unnormalised) posterior distribution

\[ \begin{aligned} \pi(\beta|y) &\propto \prod_i \frac{(e^{\sum_j \beta_j x_{ij}})^{y_i} e^{-e^{\sum_j \beta_j x_{ij}}}}{y_i!} \prod_j \frac{1}{\sqrt{2\pi}}e^{ -\beta_j^2/2 } \\ & \propto \exp \left( \sum_{i,j} \beta_j x_{ij}y_i - e^{\sum_{i,j} \beta_j x_{ij} } - \frac{1}{2} \sum_j \beta_j^2 \right). \end{aligned} \]

Random Walk Metropolis output

## The acceptance rate is:  0.2546

Marginal posterior histograms

Further comments

colMeans(mc2$x_store)
## [1]  0.3944325 -0.2350571  1.1027465

End of lecture