Why do we need to know about Full Conditional Distributions?

To effectively generate samples from the joint posterior distributions, we will need an algorithm called Gibbs sampler. Otherwise we are stuck. Until the 90’s, Bayesian statistics was merely an interesting concept but with very little applications because it was hard (except for cherry-picked cases) to make inference on the joint posterior distribution; this distribution is often intractable once we have multiple parameters (which is typically the case for most problems). However, with the advent of the Gibbs sampler in the 90’s, there was a revolution because this algorithm allowed statisticians to obtain sample from this joint distribution. As we have seen, if we have samples from a distribution, we can use it to do all sorts of inferences. In summary, the Gibbs sampler is the main algorithm used in Bayesian statistics, underlying Winbugs, JAGS, and all the other software that do some Bayesian black magic. Importantly, a key concept for Gibbs samplers is that of Full Conditional Distributions (abbreviated here as FCD’s).

What are Full Conditional Distributions?

FCD’s are the distributions of each parameter given all the other parameters and the data.

For instance, say we have four parameters {a,b,c,d}, data X and independent priors, and that the posterior distribution is given by:

\[p(a,b,c,d|X) \propto L(X|a,b,c,d)p(a)p(b)p(c)p(d)\]

Then, the FCD’s would be:

\[p(a|X,b,c,d) \propto L(X|a,b,c,d)p(a)\] \[p(b|X,a,c,d) \propto L(X|a,b,c,d)p(b)\] \[p(c|X,a,b,d) \propto L(X|a,b,c,d)p(c)\] \[p(d|X,a,b,c) \propto L(X|a,b,c,d)p(d)\]

Notice here that we just retain those terms that depend on the target parameter. For instance, \(p(b)\),\(p(c)\),\(p(d)\) do not change for different values of the parameter “a” while \(L(X|a,b,c,d)\) and \(p(a)\) do. Therefore, the FCD for the parameter “a” does not include \(p(b)\), \(p(c)\), and \(p(d)\).

Here is a more convoluted example. Say our model was given by:

\[p(a,b,c,d|X) \propto L(X|a,b,c)p(a|c,d)p(b)p(c)p(d)\]

The corresponding FCD’s would be:

\[p(a|X,b,c,d) \propto L(X|a,b,c)p(a|c,d)\] \[p(b|X,a,c,d) \propto L(X|a,b,c)p(b)\] \[p(c|X,a,b,d) \propto L(X|a,b,c)p(a|c,d)p(c)\] \[p(d|X,a,b,c) \propto p(a|c,d)p(d)\]

At a first glance, it might seem that \(p(d|X,a,b,c)\) is odd because it suggests that the parameter d does not depend on the data given that the likelihood disappeared. However, note that d depends on a and c and that these 2 parameters are directly influenced by data. Therefore, d is indirectly influenced by the data through a and c.

An example: estimating \(\mu\) and \(\sigma^2\)

Until this moment, we have used one-parameter examples to illustrate how we can obtain the posterior distribution if we have a conjugate pair of likelihood and prior. For instance, we wanted to estimate how good of a basketball player the student is (i.e., we wanted to estimate the probability \(\pi\)) and for that we used the binomial-beta conjugate pairs to obtain a posterior beta distribution. The other example involved summarizing information from different climate proxies (i.e., we wanted to estimate \(\mu\)) and for that we assumed a normal prior and a normal likelihood. In this problem, we assumed that the variance \(\sigma^2\) was known, which is of course rarely the case. We adopted this assumption to simplify the problem into one with a single parameter to be estimated. However, ideally we would like to make inference on \(\mu\) and \(\sigma^2\). In other words, we want their posterior distribution:

\[p(\mu,\sigma^2) \propto L(X|\mu,\sigma^2)p(\mu,\sigma^2)\] Assuming independent priors for these parameters, we have: \[p(\mu,\sigma^2) \propto L(X|\mu,\sigma^2)p(\mu)p(\sigma^2)\] The specific model we had in mind was:

\[p(\mu,\sigma^2|X) \propto [\prod_{i=1}^n N(x_i|\mu,\sigma^2)] N(\mu|\mu_0,\sigma_0^2) Gamma(\frac{1}{\sigma^2}|a,b)\]

Although it is hard to derive the joint posterior distribution \(p(\mu,\sigma^2|x_1,.,x_n)\), it is easy to derive the FCD’s, particularly if we select priors for \(\mu\) and \(\sigma^2\) wisely.

FCD for \(\mu\)

The FCD for \(\mu\) is given by:

\[p(\mu|X,\sigma^2) \propto [\prod_{i=1}^n N(x_i|\mu,\sigma^2)] N(\mu|\mu_0,\sigma_0^2)\] This expression is easy to derive since it is identical to the problem we previously had, in which we assumed \(\sigma^2\) was known. We find that:

\[p(\mu|X,\sigma^2) =N(\frac{(\frac{n}{\sigma^2}\bar{x}+\frac{1}{\sigma_0^2}\mu_0)}{(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2})},(\frac{n}{\sigma^2}+\frac{1}{\sigma_0^2})^{-1})\] If you recall, the mean of this FCD is a weighted average between our empirical average \(\bar{x}\) and our prior \(\mu_0\).

FCD for \(\frac{1}{\sigma^2}\)

Again, we have already derived this in one of our homework exercises:

\[p( \frac{1}{\sigma^2} |x_1,.,x_n,\mu)\propto L(x_1,.,x_n|\mu,\sigma^2 )p(\frac{1}{\sigma^2})\] \[\propto [\prod_{i=1}^n N(x_i|\mu,\sigma^2)]Gamma(\frac{1}{\sigma^2}|a,b)\]

\[\propto [\prod_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} exp(-\frac{1}{\sigma^2} (x_i-\mu)^2)](\frac{1}{\sigma^2})^{a-1}exp(-b\frac{1}{\sigma^2})\] \[\propto (\frac{1}{\sigma^2})^{(a+\frac{n}{2})-1}exp(-\frac{1}{\sigma^2}(\frac{\sum_i (x_i-\mu)^2}{2}+b))\] It should be clear by now that the full conditional for the precision \(\frac{1}{\sigma^2}\) is also a gamma distribution: \[p( \frac{1}{\sigma^2} |X,\mu)=Gamma(a+\frac{n}{2},\frac{\sum_i(x_i-\mu)^2}{2}+b)\]



Back to main menu

Comments?

Send me an email at