Improving MCMC results

Learning objectives

In this material, we will go over several concepts related to convergence that I think are very important for Bayesian statistics:

Auto-correlation function (ACF) and effective sample size
How the Gibbs sampler explores parameter space
The importance of centering covariates

These concepts are quite general but it will be easier to illustrate them using a customized Gibbs sampler, with a within-Gibbs Metropolis-Hastings algorithm.

Data and statistical model

We will rely on an example that we have seen before. In this example, we are interested in modeling the number of species \(S_i\) in island \(i\) as a function of island size \(A_i\). We will use the data “islands.csv”. Here is how these data look like:

setwd('U:\\uf\\courses\\bayesian course\\group activities\\7 example Poisson_Logistic reg')
dat=read.csv('islands.csv',as.is=T,fileEncoding="latin1")
dat1=dat[,c('Island','Native.bird.species','island.area')]
head(dat1)

##      Island Native.bird.species island.area
## 1 Alegranza                  21        13.4
## 2   Annobon                  10          17
## 3 Ascension                  12          97
## 4   Bermuda                  14        39.3
## 5  Boavista                  18       634.1
## 6    Branco                   9           .

dat1$island.area=as.numeric(dat1$island.area)

## Warning: NAs introduced by coercion

plot(log(dat1$island.area),dat1$Native.bird.species)

Our goal here is to fit the following model:

\[S_i \sim Poisson(exp(\beta_0+\beta_1 log(A_i)))\] where we assume that the priors are given by:

\[\beta_0 \sim N(0,10)\] \[\beta_1 \sim N(0,10)\]

Assessing convergence

After fitting this model with our Gibbs sampler (using the MH algorithm to sample from the FCDs), we store the results from our model in the file poisreg results no centering.csv.

The first thing we will do is to determine if and when the algorithm has converged. From the plots below, it seems that our algorithm converges after approximately 500 iterations.

#get data
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\improving MCMC')
param=read.csv('poisreg results no centering.csv',as.is=T)

ngibbs=nrow(param)
nburn=500
par(mfrow=c(2,2))
plot(param$b0,type='l',main='Beta0 all iterations',xlab='',ylab='')
plot(param$b1,type='l',main='Beta1 all iterations',xlab='',ylab='')
plot(param$b0[nburn:ngibbs],type='l',main='Beta0 after convergence',xlab='',ylab='')
plot(param$b1[nburn:ngibbs],type='l',main='Beta1 after convergence',xlab='',ylab='')

b0=param$b0[nburn:ngibbs]
b1=param$b1[nburn:ngibbs]

Auto-correlation function (ACF) and effective sample size

Recall that our MCMC algorithm generates correlated random samples from the posterior distribution. One indicator that our algorithm is not working as well as it could is that our samples are highly auto-correlated.

What do I mean by auto-correlation? This is the correlation between samples from iteration \(t\) and iteration \(t+\Delta\). The auto-correlation function (ACF) calculates these correlations for different lag values \(\Delta\). In the example below, the ACF reveals that the correlation decays very slowly for both \(\beta_0\) and \(\beta_1\). For example, for both of these parameters, correlation is >0.4 even if we just retain every 20th sample and discard everything in-between.

Notice that we are not talking about correlation between different parameters (e.g., \(\beta_0\) and \(\beta_1\)). Rather, we are talking about correlation among samples from the same parameter.

seq1=seq(from=0,to=1,by=0.2)
par(mfrow=c(2,1),mar=c(4,4,4,1))
acf(b0)
abline(h=seq1,col='grey',lty=3)
acf(b1)
abline(h=seq1,col='grey',lty=3)

A related concept and very useful metric regarding the effectiveness of our algorithm is the effective sample size (ESS). EES describes the number of independent samples that would contain the same amount of information as your correlated samples from the posterior distribution. For example, you can see below that, despite having 9501 samples from the posterior distribution of \(\beta_0\), this is equivalent to having just 183 independent samples because of the very high autocorrelation. In other words, our 9501 samples from the posterior distribution are worth just 183 independent samples.

library(coda)
effectiveSize(b0); length(b0)

##     var1 
## 183.1844

## [1] 9501

By the way, the effective sample size is precisely what “n.eff” means in the standard output of JAGS.

Back to main menu

Comments?

Send me an email at