Markov chain Monte Carlo sampling algorithms

Lecture 5

Dr Samuel Livingstone

Last time

Outline

Part 1: Gradient-based algorithms

Motivation for better Metropolis–Hastings

How does the Random Walk Metropolis perform in high dimension?

The mixing time is typically \(O(d)\) (good, but let’s try to do better)

Diffusion processes

Simulating diffusion processes

A simulation of the diffusion with SDE \(dY_t = dW_t\) for \(t\in [0,1]\) (Euler–Maruyama skeleton + interpolation)

set.seed(5367)
delta <- 1/1000        # time step
dW <- sqrt(delta)*rnorm(1000)
W <- cumsum(dW)
plot((1:1000)/1000,W, type = 'l', xlab = "t",ylab = "W_t")

The (overdamped) Langevin diffusion

Using the Langevin diffusion to construct MCMC proposals

Observations about MALA

What is known about MALA

Empirical comparison vs Random walk

Show output

Stylised example model class

MALA for the stylised example class

Plots for the stylised example class

Remarks on other gradient-based algorithms

A flavour of Hamiltonian Monte Carlo (HMC) 1

A flavour of Hamiltonian Monte Carlo (HMC) 2

A flavour of Hamiltonian Monte Carlo (HMC) 3

Example implementation

Show animation

Part 2: Unadjusted sampling algorithms

What is an unadjusted algorithm?

The unadjusted Langevin algorithm

Stochastic gradient Langevin dynamics (SGLD)

Remarks on SGLD

Part 3: Parallel Tempering + adaptive MCMC

Multi-modal target distributions

Tempering example

Tempering algorithms

Example: mixture of 2 Gaussians, means 8 and -8

Algorithm design for tempering - natural questions

Adaptive MCMC - Motivation

Adaptive MCMC - basic idea

A very simple adaptive MCMC algorithm

Code for the simple adaptive MCMC algorithm

RWM_output <- RWM(logpi = logpi_gauss, 
                    nits = adapt_iter,
                    h = h[k],
                    x_curr = start_x[k])

av_accept[k] <- RWM_output$a_rate

# update h based on average_accept_rate
if (av_accept[k] > 0.5) {
  h[k+1] <- h[k] * 1.2
} else if (av_accept[k] < 0.4) {
  h[k+1] <- h[k] * 0.8
} else {
  h[k+1] <- h[k]
}

Results for the simple adaptive MCMC algorithm

## Epoch:  1 h =  100 Average acceptace rate:  0.03 
## Epoch:  2 h =  80 Average acceptace rate:  0.01 
## Epoch:  3 h =  64 Average acceptace rate:  0.02 
## Epoch:  4 h =  51.2 Average acceptace rate:  0.02 
## Epoch:  5 h =  40.96 Average acceptace rate:  0.03 
## Epoch:  6 h =  32.768 Average acceptace rate:  0.01 
## Epoch:  7 h =  26.2144 Average acceptace rate:  0.04 
## Epoch:  8 h =  20.97152 Average acceptace rate:  0.05 
## Epoch:  9 h =  16.77722 Average acceptace rate:  0.06 
## Epoch:  10 h =  13.42177 Average acceptace rate:  0.1 
## Epoch:  11 h =  10.73742 Average acceptace rate:  0.15 
## Epoch:  12 h =  8.589935 Average acceptace rate:  0.12 
## Epoch:  13 h =  6.871948 Average acceptace rate:  0.18 
## Epoch:  14 h =  5.497558 Average acceptace rate:  0.23 
## Epoch:  15 h =  4.398047 Average acceptace rate:  0.25 
## Epoch:  16 h =  3.518437 Average acceptace rate:  0.35 
## Epoch:  17 h =  2.81475 Average acceptace rate:  0.32 
## Epoch:  18 h =  2.2518 Average acceptace rate:  0.48 
## Epoch:  19 h =  2.2518 Average acceptace rate:  0.44 
## Epoch:  20 h =  2.2518 Average acceptace rate:  0.37

A word of caution about adaptation

Some further MCMC algorithms that I didn’t mention

End of module - Do zobaczenia!

I hope you enjoyed the course and am happy to answer any further questions over e-mail!

Appendix: Case studies

Case Study 1: Bayesian Variable Selection

The Variable selection problem

Solution 1: Choose the best model

Example: genomic data

Bayesian variable selection via MCMC

Choice of prior distribution

Example prior specification

MCMC over \(\gamma\) space

Add-Delete-Swap sampler

At each iteration, given current state \(\gamma \in \{0,1\}^p\), do the following:

  1. Decide whether to ADD/DELETE w.p. \(\lambda\) or SWAP w.p. \(1-\lambda\)

  2. First set proposal \(\gamma^* = \gamma\). Then do the following

    1. If ADD/DELETE, draw \(j \sim \mathcal{U}[{1,...,p}]\) and set \(\gamma^*_j = 1-\gamma_j\)
    2. If SWAP, choose a \(j\) for which \(\gamma^*_j = 1\) and a \(k\) for which \(\gamma^*_k = 0\) and swap them (both choices uniform)
  3. Set the next state to be \(\gamma^*\) with probability \(\alpha(\gamma, \gamma^*)\), otherwise set it to be \(\gamma\), where \[ \alpha(\gamma,\gamma^*) = \min \left( 1, \frac{\pi(\gamma^*|y,X)}{\pi(\gamma|y,X)} \right). \]

Some remarks

One useful posterior summary

Examples

Case study 2: GLSMs for radioactivity estimation

Generalised Linear Spatial Models (GLSMs)

Radioactivity at Rongelap island

The Marshall Islands (from Encyclopedia Brittanica)

The dataset (Diggle, Tawn & Moyeed, 1997)

The Posterior

MCMC strategy

The mean posterior intensity field (Diggle, Tawn & Moyeed, 1997)

Areas of with high probability of high contamination (Diggle, Tawn & Moyeed, 1997)