Description of statistical model
Say we are interested in the following statistical model:
\[p(\mu,\sigma^2|x_1,...,x_n) \propto [\prod_{i=1}^n N(x_i|\mu , \sigma^2)]N(\mu | m_0,v_0^2) Gamma(\frac{1}{\sigma^2}|a,b)\]
The problem here is that we do not know what the posterior \(p(\mu,\sigma^2|x_1,...,x_n)\) looks like. Nevertheless, we do know what the individual FCD’s are:
\[p(\mu|X,\sigma^2) \propto [\prod_{i=1}^n N(x_i|\mu , \sigma^2)]N(\mu | m_0,v_0^2) \] \[= N(\frac{(\frac{n}{\sigma^2}\bar{x}+\frac{1}{v_0^2}m_0)}{(\frac{n}{\sigma^2}+\frac{1}{v_0^2})},(\frac{n}{\sigma^2}+\frac{1}{v_0^2})^{-1})\]
and
\[p( \frac{1}{\sigma^2} |X,\mu)\propto [\prod_{i=1}^n N(x_i|\mu , \sigma^2)]Gamma(\frac{1}{\sigma^2}|a,b)\]
\[=Gamma(a+\frac{n}{2},\frac{\sum_i(x_i-\mu)^2}{2}+b)\]
Because we know the FCD’s, we can run our Gibbs sampler.
Estimating \(\mu\) and \(\sigma^2\)
Our Gibbs sampler will work in the following way (subscripts here denote iteration):
- Specify an initial value for \(\sigma_1^2\)
- Now iterate through the following steps:
Generate a sample from \(p(\mu_2|x_1,.,x_n,\sigma_1^2)\)
Using the result from (a), generate a sample from \(p(\frac{1}{\sigma_2^2} |x_1,.,x_n,\mu_2)\)
Using the result from (b), generate a sample from \(p(\mu_3|x_1,.,x_n,\sigma_2^2)\)
Using the result from (c), generate a sample from \(p(\frac{1}{\sigma_3^2} |x_1,.,x_n,\mu_3)\)
And on and on.
Notice the changes in the subscript. Here is a picture that summarizes this process:
The term MCMC, which stands for Markov Chain Monte Carlo, becomes clear with this example. The “chain” refers to the fact that we are constructing a sequence of parameter estimates that are interlinked and “markov” is a term that specifies that the algorithm at any given point just depends on the immediate past of the chain (i.e., the previous iteration) rather than on parameter values several iterations ago. For instance, \(\mu_4 |\sigma_3^2\) is independent of \(\sigma_1^2,\sigma_2^2,\mu_1,\mu_2\), and \(\mu_3\). Finally “Monte Carlo” just states that this is a stochastic algorithm. In other words, had we started it over again, our chain would contain different values.
To fit this model using our own Gibbs sampler, we will have to assume a value for the the prior parameters \(m_0\), \(v_0^2\), \(a\), and \(b\). Notice that people often use small a and b parameters for the prior gamma distribution. Why is that?
Comments?
Send me an email at