12A_normal_normal_example.utf8

Learning objectives

Recall the definition of a conjugate likelihood-prior pair. The idea is that the posterior and the prior are from the same distribution, albeit with different parameters. There are multiple types of conjugate pairs. For instance, previously we saw how the binomial likelihood combined with a beta prior leads to a beta posterior.

The goal of this page is to go over the normal-normal conjugate pair, illustrating it with a climate change example.

1. Climate change example

We will illustrate this conjugate pair with a climate change example. Because we do not have direct meteorological measurements of temperature that goes way back into the planet’s deep past, paleoclimatologists have relied extensively on proxy measures (e.g., sub-fossil pollen, corals, ocean sediments, tree rings, ice cores). Our dataset “climate change data edited.csv” consists of historical measurements of climate change based on multiple types of proxies aind includes the following variables:

deltaT: difference in temperature, calculated as past temperature - current temperature
sdev: a measure of uncertainty associated with each observation. This is important because, as we are dealing with proxies, these measurements may reflect temperature more accurately or less accurately
T.M: if the measurement was collected from a terrestrial (T.M=0) or marine (T.M=1) environment
latitude: latitude in which measurements were taken
proxy1: the type of proxy

We want to summarize the climate change information based on all these data.

2. Defining the model

2.1. Likelihood

Ignoring for the moment being the fact that each observation has a different level of uncertainty associated with it (i.e., we will assume a fixed \(\sigma^2\) for all observations), let’s assume that each observation \(x_i\) comes from a normal distribution:

\[x_i\sim N(\mu,\sigma^2)\]

Assuming independence between these observations given \(\mu\) and \(\sigma^2\) implies that we can write the likelihood as:

\[\prod_{i=1}^n N(x_i|\mu ,\sigma^2)\]

2.2. Prior

Say that we can represent our prior beliefs regarding average climate change using a normal distribution:

\[p(\mu)=N(m_0,v_0^2)\]

2.3. Posterior

Now, we have all the ingredients that we need to get the posterior distribution: \[p(\mu|x_1,...,x_n,\sigma^2)\propto [\prod_i^n N(x_i|\mu,\sigma^2)]N(\mu|m_0,v_0^2) \] Notice that we are assuming that we know \(\sigma^2\) and therefore the only parameter that we need to estimate is \(\mu\). At a later moment, we will learn how to get the posterior distribution of both \(\sigma^2\) and \(\mu\).

Because this is a conjugate pair, what is \(p(\mu|x_1,...,x_n,\sigma^2)\)? It is normal. Say that the parameters for this normal distribution are \(m_1,v_1^2\). This means that, in the end of our derivation, we are after something of the form:

\[p(\mu|x_1,...,x_n,\sigma^2)=N(\mu|m_1,v_1^2) \propto exp(-\frac{1}{2v^2_1}(\mu-m_1)^2)\]

Because our derivation is a bit messy and thus can quickly become confusing, remember that:

We are interested in the posterior distribution of \(\mu\), denoted by \(p(\mu|x_1,...,x_n,\sigma^2)\)
\(m_0,v_0^2\) are parameters that describe our prior beliefs regarding \(\mu\)
\(m_1,v_1^2\) are parameters that describe our updated beliefs (i.e., after seeing the data) regarding \(\mu\)

The detailed derivation can be found in the pdf document available here. Here I will just highlight some interesting facts about this conjugate pair. It turns out that:

\[m_1= \frac{\frac{n}{\sigma^2}\bar{x}+\frac{1}{v_0^2}m_0}{\frac{n}{\sigma^2}+\frac{1}{v_0^2}}\] This expression is quite revealing. It shows that the posterior mean for the parameter \(\mu\), given by \(m_1\), is a weighted average between the sample mean \(\bar{x}\) and the prior mean \(m_0\). The weights here are equal to \(\frac{n}{\sigma^2}\) and \(\frac{1}{v_0^2}\).

Notice that:

if we have very little prior information regarding the parameter \(\mu\) and therefore we use a very large prior variance (i.e., \(v_0^2\rightarrow \infty\)), then \(m_1\) will be strongly influenced by the data (i.e., \(m_1\rightarrow\bar{x}\));
if n is big, then our data will overwhelm the prior and \(m_1\rightarrow\bar{x}\) as \(n\rightarrow \infty\);
if we have very little data (e.g., n=1) or we have a lot of prior knowledge regarding \(\mu\) (expressed by having a very small \(v_0^2\)), then \(m_1\) will be strongly influenced by \(m_0\);
if we have a very low precision for our likelihood (i.e., \(\sigma^2\) is big), then \(m_1\) will also be strongly influenced by our prior beliefs. This is akin to measuring something with a very imprecise instrument (e.g., measuring people’s height with a 1-meter pole). In this case, we cannot trust very much our observations.

This expression reveals the common phenomenon of Bayesian analysis trying to strike a balance between what the data are telling us and our prior beliefs. It is useful to know this expression. For instance, if we were trying to avoid our prior from influencing the analysis, we could set \(v_0^2\) to be large.

3.1. Data analysis with uninformative (vague) prior

Let’s see what this means in concrete terms. In this model, \(\mu\) is a parameter that describes the overall amount of climate change across all proxies and \(m_1\) is the posterior mean of this parameter. If \(m_1\) is very negative, then this implies that there is substantial evidence that current temperature is higher on average than past temperature. On the other hand, if \(m_1\approx 0\) then there is not much evidence of climate change across all these proxies. Here is some code in which we assume a weak prior for \(\mu\):

rm(list=ls(all=T))
setwd('U:\\uf\\courses\\bayesian course\\2018\\activities\\1 climate change')
dat=read.csv('climate change data edited.csv',as.is=T)
xbar=mean(dat$deltaT)
n=nrow(dat)

sig2=mean(dat$sdev^2) #we assume that this parameter is known (i.e., it is not being estimated)
m0=0 #prior assumption of no climate change

#assume a vague prior
v0sq=10
prec=(n/sig2)+(1/v0sq) #the derivation for this can be found in the pdf document mentioned above
v1sq_weak=1/prec #variance of the posterior distribution for mu
m1_weak=v1sq_weak*((n/sig2)*xbar+(1/v0sq)*m0) #mean of the posterior distribution for mu

#compare m1 and xbar
m1_weak; xbar

## [1] -3.106077

## [1] -3.111111

#plot 
x=seq(from=-10,to=10,length=1000)
plot(x,dnorm(x,mean=m0,sd=sqrt(v0sq)),type='l',col='grey',ylim=c(0,3.5),ylab='Density',xlab=expression(mu)) #plotting the prior
lines(x,dnorm(x,mean=m1_weak,sd=sqrt(v1sq_weak)),col='black') #plotting the posterior
abline(v=xbar,col='blue')
legend(x=0,y=3.5,c('Weak prior','Posterior','Sample average'),lty=1,col=c('grey','black','blue'))

This example shows that the data contain a lot of information on \(\mu\). If we use a relatively uninformative prior (i.e., \(v_0^2=10\)), we find that \(m_1\) and \(\bar{x}\) are very similar and that the posterior distribution is much more peaked than the prior distribution. Ultimately, our results suggest that climate change has on average been equal to -3.1.

3.2. Data analysis with informative (strong) prior

On the other hand, as illustrated below, if we assume a strong prior for \(\mu\) (i.e., \(v_0^2=0.1\)), then our results suggest that climate change has been less dramatic, equal to -2.23. Notice that the posterior distribution has substantially shifted towards the prior because we have substantially increased the weight of the prior mean \(m_0\).

#what happens with strong prior
v0sq=0.1
prec=(n/sig2)+(1/v0sq) #the derivation for this can be found in the pdf document mentioned above
v1sq_strong=1/prec #variance of the posterior distribution for mu
m1_strong=v1sq_strong*((n/sig2)*xbar+(1/v0sq)*m0) #mean of the posterior distribution for mu

#compare m1 and xbar
m1_strong; xbar

## [1] -2.677204

## [1] -3.111111

#plot 
plot(x,dnorm(x,mean=m0,sd=sqrt(v0sq)),type='l',col='grey',ylim=c(0,3.5),ylab='Density',xlab=expression(mu)) #plotting the prior
abline(v=xbar,col='blue')
lines(x,dnorm(x,mean=m1_weak,sd=sqrt(v1sq_weak)),col='black') #plotting the posterior
lines(x,dnorm(x,mean=m1_strong,sd=sqrt(v1sq_strong)),col='black',lty=3) #plotting the posterior under a strong prior
legend(x=0,y=3.5,c('Strong prior','Posterior with weak prior','Posterior with strong prior','Sample average'),lty=c(1,1,3,1),col=c('grey','black','black','blue'))

Additional examples and derivations regarding one parameter conjugate pairs can be found in pages 33-54 in (Gelman et al. 2004) and pages 1-7 in (Hoff 2009).

Back to main menu

Comments?

Send me an email at

References

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2004. Bayesian Data Analisys. Chapman & Hall/CRC.

Hoff, P. D. 2009. A First Course in Bayesian Statistical Methods. Springer.