Markov chain Monte Carlo sampling algorithms

Lecture 1

Dr Samuel Livingstone

Welcome & Logistics

Module Outline

Part 1: Preliminaries & motivation

Notation

Notation: example 1

Notation: example 2

Notation: final example

Motivation

Part 2: Monte Carlo

Example: Inference in latent variable models

Example (Classical significance testing)

Example (Bayesian inference)

What is an intractable integral?

A simple example

Intractable integrals can look very innocent!

Analytical intractability \(\implies\) numerical methods

Trapezium rule for 1D Gaussian density

The Monte Carlo Principle

If an integral can be written as an expectation with respect to some probability distribution, we can approximate it by sampling from the distribution and computing the empirical average.

The Monte Carlo method

Why does Monte Carlo work?

Monte Carlo works because of two fundamental results from probability theory

A more statistical assessment of Monte Carlo

Some natural questions

Part 3: Sampling from a probability distribution

The big picture

Direct sampling via transformations

Bernoulli sampling

A more complex example

Graphical representation of the discrete case

Continuous case: Inverse CDF method

Example (Exponential distribution)

Example (Exponential distribution)

uniform_samples <- runif(1000)
transformed_samples <- -log(1-uniform_samples)
hist(transformed_samples, xlab = "", ylab = "")

Limitations

Indirect Sampling

Rejection sampling: simple example

How could we modify the procedure to draw samples from this distribution?

We can write the target and candidate distributions mathematically…

\[ \begin{aligned} \pi &= (1/2,1/4,1/4,0,0,0) \\ q &= (1/6,1/6,1/6,1/6,1/6,1/6) \end{aligned} \]

Question: How would the problem change if \(q = (1/3,1/3,1/3)\) instead?

Rejection sampling: main idea

Rejection Sampling Algorithm

The constant \(M\)

Modified example worked calculation

A continuous example

Example in the case \(\sigma = 3\)

Proof that the rejection sampling algorithm produces samples from \(\pi\) (in case \(\pi\) & q have densities)

What proportion of samples will be accepted?

The tails of \(q\) should be at least as heavy as \(\pi\) for finite \(M\) (here \(\pi\) qs \(N(0,1)\), \(q\) is \(N(0,1/2)\))

Towards a more general algorithm

Importance sampling

(Naive) Importance Sampling Algorithm

  1. Draw \(X_i \sim q\) for \(1\leq i\leq N\)
  2. For each \(i\) compute the importance weights \[ w(X_i):= \frac{d\pi}{dq}(X_i) \]
  3. Compute \[ \hat{f}^{IS-n}_N := \frac{1}{N}\sum_{i=1}^N w(X_i)f(X_i) \]

The bias & variance of the naive IS estimator

Self-normalised importance sampling

The ‘effective’ sample size

Effective sample size: intuition

End of lecture