Learning objectives
In this material, we will go over several concepts related to convergence that I think are very important for Bayesian statistics:
- Auto-correlation function (ACF) and effective sample size
- How the Gibbs sampler explores parameter space
- The importance of centering covariates
These concepts are quite general but it will be easier to illustrate them using a customized Gibbs sampler, with a within-Gibbs Metropolis-Hastings algorithm.
Data and statistical model
We will rely on an example that we have seen before. In this example, we are interested in modeling the number of species \(S_i\) in island \(i\) as a function of island size \(A_i\). We will use the data “islands.csv”. Here is how these data look like:
setwd('U:\\uf\\courses\\bayesian course\\group activities\\7 example Poisson_Logistic reg')
=read.csv('islands.csv',as.is=T,fileEncoding="latin1")
dat=dat[,c('Island','Native.bird.species','island.area')]
dat1head(dat1)
## Island Native.bird.species island.area
## 1 Alegranza 21 13.4
## 2 Annobon 10 17
## 3 Ascension 12 97
## 4 Bermuda 14 39.3
## 5 Boavista 18 634.1
## 6 Branco 9 .
$island.area=as.numeric(dat1$island.area) dat1
## Warning: NAs introduced by coercion
plot(log(dat1$island.area),dat1$Native.bird.species)
Our goal here is to fit the following model:
\[S_i \sim Poisson(exp(\beta_0+\beta_1 log(A_i)))\] where we assume that the priors are given by:
\[\beta_0 \sim N(0,10)\] \[\beta_1 \sim N(0,10)\]
Assessing convergence
After fitting this model with our Gibbs sampler (using the MH algorithm to sample from the FCDs), we store the results from our model in the file poisreg results no centering.csv.
The first thing we will do is to determine if and when the algorithm has converged. From the plots below, it seems that our algorithm converges after approximately 500 iterations.
#get data
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\improving MCMC')
=read.csv('poisreg results no centering.csv',as.is=T)
param
=nrow(param)
ngibbs=500
nburnpar(mfrow=c(2,2))
plot(param$b0,type='l',main='Beta0 all iterations',xlab='',ylab='')
plot(param$b1,type='l',main='Beta1 all iterations',xlab='',ylab='')
plot(param$b0[nburn:ngibbs],type='l',main='Beta0 after convergence',xlab='',ylab='')
plot(param$b1[nburn:ngibbs],type='l',main='Beta1 after convergence',xlab='',ylab='')
=param$b0[nburn:ngibbs]
b0=param$b1[nburn:ngibbs] b1
Auto-correlation function (ACF) and effective sample size
Recall that our MCMC algorithm generates correlated random samples from the posterior distribution. One indicator that our algorithm is not working as well as it could is that our samples are highly auto-correlated.
What do I mean by auto-correlation? This is the correlation between samples from iteration \(t\) and iteration \(t+\Delta\). The auto-correlation function (ACF) calculates these correlations for different lag values \(\Delta\). In the example below, the ACF reveals that the correlation decays very slowly for both \(\beta_0\) and \(\beta_1\). For example, for both of these parameters, correlation is >0.4 even if we just retain every 20th sample and discard everything in-between.
Notice that we are not talking about correlation between different parameters (e.g., \(\beta_0\) and \(\beta_1\)). Rather, we are talking about correlation among samples from the same parameter.
=seq(from=0,to=1,by=0.2)
seq1par(mfrow=c(2,1),mar=c(4,4,4,1))
acf(b0)
abline(h=seq1,col='grey',lty=3)
acf(b1)
abline(h=seq1,col='grey',lty=3)
A related concept and very useful metric regarding the effectiveness of our algorithm is the effective sample size (ESS). EES describes the number of independent samples that would contain the same amount of information as your correlated samples from the posterior distribution. For example, you can see below that, despite having 9501 samples from the posterior distribution of \(\beta_0\), this is equivalent to having just 183 independent samples because of the very high autocorrelation. In other words, our 9501 samples from the posterior distribution are worth just 183 independent samples.
library(coda)
effectiveSize(b0); length(b0)
## var1
## 183.1844
## [1] 9501
By the way, the effective sample size is precisely what “n.eff” means in the standard output of JAGS.
Comments?
Send me an email at