MCMC convergence

How do I tell if my MCMC has converged?

Let’s talk about convergence of a MCMC algorithm. The output of your MCMC can be characterized by 3 stages/phases:

Initial parameter values: to initialize the algorithm, we need to specify values for each of the parameters being estimated.
Burn-in stage: Because MCMC algorithms generate correlated parameter estimates, the values that you pick to initialize the algorithm can influence the first couple of hundred/thousand outcomes from your model. As a consequence, users typically discard the samples generated during this stage. Furthermore, some sampling algorithms (e.g., Metropolis algorithm) can be modified (i.e., “adapted”) to improve how samples are generated. However, samples generated while these modifications were taking place are not valid samples from the joint posterior distribution. Therefore, they should be discarded as well. This is called the burn-in stage.
Converged stage: The whole idea is that, eventually, the algorithm will start generating output that represents random draws (though typically not independent) from the joint posterior distribution, having forgotten the initial values used to start the algorithm. Furthermore, because we are sampling from the joint posterior distribution, we do not expect significant changes in the characteristics of this output, such as its mean, variance, etc.

How can we tell when we are done with the burn-in stage, the algorithm has converged and we have valid samples from the joint posterior distribution? There is no fool-proof method but here are some approaches, with corresponding cartoon figures:

Visually examining trace plots: trace plots graph a particular parameter versus the number of iterations. There should not be major trends in this plot if the algorithm has indeed converged. I tend to rely on this as a quick check. For instance, the figure below suggests that after 2,000 iterations we are effectively sampling from a distribution that is not changing and which, hopefully, represents the posterior distribution (lowess curve is depicted in red).

Statistically comparing different portions of the Gibbs sampler output: If there are not large differences between these different portions (e.g., based on a statistical test comparing the means; Geweke convergence diagnostic), it is assumed that the algorithm has converged. In the figure above, if we compare the mean of the samples from iteration 1-2000 to those from 2000-4000, we would expect to see a statistically significant difference. On the other hand, a comparison of means using iterations 2000-4000 versus 8000-10000 is unlikely to show a statistically significant difference, suggesting that the algorithm has converged after iteration 2000.
Running multiple chains: some approaches rely on running the algorithm multiple times, each with a different set of starting values, and comparing these output. Each set of outputs, generated by running the algorithm on a particular set of initial parameter values, is called a “chain”. If outputs from different chains are similar, this suggests that the algorithm has forgotten the initial parameter values and is exploring the same parameter space, leading us to believe that it has converged. For instance, the figure below illustrates the intuition behind running multiple chains, where each chain is depicted with a different color. In this figure, it seems that the algorithm has converged after iteration 500. The Rhat (potential scale reduction factor proposed by Gelman and Rubin) is a statistics that compares the within-chain and between-chain variances and values smaller than 1.1 indicate convergence. Many of these diagnostic statistics and corresponding plots can be found in the R package “coda” (e.g., functions gelman.diag() and geweke.diag()). JAGS also automatically reports Rhat for each of the parameters.

Back to main menu

Comments?

Send me an email at