How does the Gibbs sampler work?
Our goal is to generate samples from the posterior distribution. Once we have these samples, we can then summarize what we know about the model parameters given the data. To generate these samples, we will rely on an algorithm called Gibbs sampler, which is a type of Markov Chain Monte Carlo (MCMC) algorithm.
Say that we are interested in parameters \(a,b,c,d\) in the following model:
\[p(a,b,c,d|X)\propto L(X|a,b,c,d)p(a)p(b)p(c)p(d)\]
The key steps for a Gibbs sampler are:
Initiate the algorithm with starting values for all the parameters a, b, c, and d.
Generate samples from each of the Full Conditional Distributions (FCDs):
\[a^* \sim p(a^*|X,b,c,d)\] \[b^* \sim p(b^*|X,a^*,c,d)\] \[c^* \sim p(c^*|X,a^*,b^*,d)\] \[d^* \sim p(d^*|X,a^*,b^*,c^*)\]
where the symbol *
represents the updated value of the parameter. The exact sequence of the FCDs does not matter but it is important to notice that whenever we update one parameter, we use its new value for all the subsequent remaining FCDs. By the end of this iteration, we have obtained new values for all parameters \(a^*,b^*,c^*,d^*\).
- Repeat step (2).
We will not go into the theory of MCMC to prove that the Gibbs Sampling algorithm works but here are some key terms and concepts that you can look up if you are interested in reading more about this. The Gibbs Sampler algorithm generates an ergodic (i.e., irreducible and aperiodic) Markov chain process. An ergodic Markov Chain has a unique stationary/invariant distribution \(\Pi(x)\) and this Markov process converges in the sense that any realization (i.e., from any starting value) has a limiting distribution \(\Pi(x)\). In Bayesian analysis, the stationary distribution will be the target posterior distribution of interest.
In other words, theory tells us that if we sample parameters \(a,b,c,d\) in this particular way, the algorithm will eventually (i.e., after the algorithm has converged) generate samples from the posterior distribution. While this is the basic setup, there are variations in which some subsets of parameters are proposed jointly. For example:
\[a^*,b^* \sim p(a^*,b^*|X,c,d)\] \[c^* \sim p(c^*|X,a^*,b^*,d)\] \[d^* \sim p(d^*|X,a^*,b^*,c^*)\]
This is often done to make the algorithm more efficient. However, for the moment being, we will keep it simple by sampling the individual parameters one at a time.
Comments?
Send me an email at