Dr Samuel Livingstone
Recap on Bayesian inference \[ \pi(\theta|y) \propto \pi_0(\theta)f(y|\theta) \]
General state space Markov chains and ergodic averages \[ \begin{aligned} X_{i+1}|X_i \sim &P(X_i,\cdot), ~~~ \lim_{n\to\infty}\| P^n(x,\cdot) - \pi \|_{TV} = 0,\\ &\frac{1}{n}\sum_{i=1}^nf(X_i) \approx \mathbb{E}_\pi[f]. \end{aligned} \]
If a Markov chain \(P\) is \(\pi\)-irreducible, aperiodic and \(\pi\)-invariant, then \(\pi\) is the unique limiting distribution.
\(\pi\)-irreducible: any event \(A\) s.t. \(\pi(A) > 0\) has a chance of being visited
Aperiodic means the chain doesn’t cycle through disjoint regions periodically
\(\pi\)-invariant if \(X_i \sim \pi \implies X_{i+1} \sim \pi\)
Key fact: If \(X_n \sim \mu\), then the (marginal) distribution of \(X_{n+1}\) is given by \[ \mathbb{P}(X_{n+1} \in A) = \int P(x,A)\mu(dx) \]
\(\pi\) is invariant for \(P\) if for any \(A \in \mathcal{E}\) \[ \pi(A) = \int P(x,A)\pi(dx) \]
If \(P\) is \(\pi\)-reversible, then it is \(\pi\)-invariant \[ \pi(dx)P(x,dy) = \pi(dy)P(y,dx) \]
Note, however, that Markov chains can be \(\pi\)-invariant without being \(\pi\)-reversible
(Assume \(\pi\) has a density (or mass function in discrete case))
For \(i \in \{0,...,n-1\}\)
Output the Markov chain \(\{X_1,...,X_n\}\).
Question. Why does this work when only \(c\pi\) is known?
A random walk is a very simple Markov chain \[ X_{i+1} = X_i + \xi_{i+1}, \] \(\xi_{i+1} \sim N(0,\sigma^2)\) (for example).
It is well-studied and easy to simulate
The transition density is given by \[ q(x,y) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{ -\frac{1}{2\sigma^2}(y-x)^2\right\} \]
Note that \(q(x,y) = q(y,x)\)
Questions. 1. How is this Markov chain used in the random walk Metropolis algorithm? 2. What measure is the random walk reversible with respect to?
The acceptance rate has an intuitive interpretation \[ \pi(Y) \geq \pi(X_i) \implies \text{Accept the move!} \]
Uphill moves are always accepted, downhill moves are sometimes rejected
This means that the Markov chain spends more time in regions of higher probability under \(\pi\)
Show demo code
Suppose that \(y \neq x\) is proposed and then accepted.
The density associated with this is \[ q(x,y)\min\left( 1, \frac{\pi(y)}{\pi(x)}\right) \] where \(q(x,y)\) is the random walk transition density
The detailed balanced equations are \[ \pi(x)q(x,y)\min\left(1,\frac{\pi(y)}{\pi(x)}\right) = \pi(y)q(y,x)\min\left(1,\frac{\pi(x)}{\pi(y)}\right) \]
Here this reduces to \[ \pi(x)\min\left(1,\frac{\pi(y)}{\pi(x)}\right) = \pi(y)\min\left(1, \frac{\pi(x)}{\pi(y)}\right) \]
Both sides are the same and equal to \(\min(\pi(y),\pi(x))\), so detailed balance holds
For example:
## [1] 0.9475
is an approximation for the integral: \[ \int \mathbb{I}_{(-2,2)}(x) \frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx. \]
To approximate \(\mathbb{E}_\pi[X]\) we can compute
## [1] 0.009321799
## The acceptance rate is: 0.4705
Random Walk proposals are blind in some sense (don’t use any information about \(\pi\) to construct candidate move)
So if \(h\) is too large, then it is likely that \(\pi(Y) \ll \pi(X_i)\), meaning \[ \alpha(X_i,Y) = \min\left(1, \frac{\pi(Y)}{\pi(X_i)} \right) \approx 0. \]
This will mean very few proposals are accepted and the chain won’t move much.
## The acceptance rate is: 0.043
## The acceptance rate is: 0.9835
What is the difference between the following expressions (\(U\), \(X_i\) and \(Y\) defined as above)? \[ \begin{aligned} (1)& ~ U \leq \min\left( 1, \frac{\pi(Y)}{\pi(X_i)}\right) \\ (2)& ~ U \leq \pi(Y)/\pi(X_i) \\ (3)& ~ \log(U) \leq \log\pi(Y) - \log\pi(X_i) \end{aligned} \]
What is the probability of any of the above expressions (\(U \sim \mathcal{U}[0,1]\), fixed \(X_i\) and \(Y\))? How would you prove it?
Which would we prefer to implement on a computer?
For \(i \in \{1,...,n-1\}\)
Draw \(Y \sim Q(X_i,\cdot)\)
Draw \(U \sim \mathcal{U}[0,1]\), compute \[ \alpha(X_i,Y) = \min\left(1, \frac{\pi(Y)q(Y,X_i)}{\pi(X_i)q(X_i,Y)}\right). \]
Set \[ X_{i+1} = \begin{cases} Y \qquad U \leq \alpha(X_i,Y) \\ X_i \qquad \text{otherwise} \end{cases} \]
Output the Markov chain \(\{X_1,...,X_n\}\).
Question. How are \(Q\) and \(q\) related?
In the random walk case the acceptance rate has an intuitive interpretation \[ \pi(Y) \geq \pi(X_i) \implies \text{Accept the move!} \]
This is because the proposal is symmetric, meaning \(q(y,x) = q(x,y)\)
In the general case this intuition is lost, but nonetheless choosing a different form of proposal can be advantageous
The following hold for a Markov chain produced using the Metropolis-Hastings algorithm:
Consider \(y \neq x\) is proposed and then accepted. Density associated with this is \[ q(x,y)\alpha(x,y). \]
Using the above we can show \(\pi\)-reversibility, since. \[ \begin{aligned} \pi(x)q(x,y)\alpha(x,y) &= \pi(x)q(x,y) \min \left(1, \frac{\pi(y)q(y,x)}{\pi(x)q(x,y)}\right) \\ &= \min\left( \pi(x)q(x,y), \pi(y)q(y,x) \right) \\ &= \pi(y)q(y,x) \min \left(\frac{\pi(x)q(x,y)}{\pi(y)q(y,x)}, 1 \right) \\ &= \pi(y)q(y,x)\alpha(y,x). \end{aligned} \]
Note that the conditions imply \[ q(x,y)\alpha(x,y) > 0 \] for all \(x,y \in E\).
This means that for any \(A \in \mathcal{E}\) \[ \int_A \pi(y)dy > 0 \implies \int_A q(x,y)\alpha(x,y)dy > 0, \] and hence the chain is \(\pi\)-irreducible.
It also implies aperiodicity by setting \(A\) to be any set that includes the current state \(x\)
Remark. Conditions are sufficient but not necessary.
\[ \begin{aligned} Y_i &\sim \text{Poisson}(e^{\sum_{j=1}^d \beta_jx_{ij}}), \\ \beta_j &\stackrel{iid}{\sim} N(0, 1^2). \end{aligned} \]
There are many types of data and phenomena for which this kind of probabilistic model will be appropriate, so feel free to impose that of most interest to you onto the problem!
We have imposed \(N(0,1^2)\) independent priors for each \(\beta_j\)
This will shrink the \(\beta_j\)’s towards zero
But is it the right amount of shrinkage?
To interpret parameters let’s think. If \(x_{ij,\text{new}} = x_{ij,\text{old}}+1\), then \[ \mu_{i,\text{new}} = e^{\beta_j} \times \mu_{i,\text{old}} \]
Prior therefore ensures that plausible changes are a factor of between \(e^{-2}\) and \(e^2\).
Whether or not this is sensible of course depends on your particular setting
Common alternative: \(\beta_j \stackrel{iid}{\sim} N(0,\sigma^2)\) with hyperprior on \(\sigma^2\)
\[ \begin{aligned} \pi(\beta|y) &\propto \prod_i \frac{(e^{\sum_j \beta_j x_{ij}})^{y_i} e^{-e^{\sum_j \beta_j x_{ij}}}}{y_i!} \prod_j \frac{1}{\sqrt{2\pi}}e^{ -\beta_j^2/2 } \\ & \propto \exp \left( \sum_{i,j} \beta_j x_{ij}y_i - e^{\sum_{i,j} \beta_j x_{ij} } - \frac{1}{2} \sum_j \beta_j^2 \right). \end{aligned} \]
## The acceptance rate is: 0.2546
## [1] 0.3944325 -0.2350571 1.1027465