Learning objectives
The goal of this section is to show how simulation-based inference can be very useful to calculate derived quantities without having to deal with complicated equations involving integrals.
Tree volume example
Sometimes we want to calculate integrals over quantities that are non-linearly related to the parameters we are estimating. Here is an example. Say that we have a statistical model that predicts the future diameter of a tree 10 years from now (\(D_{10}\)). Assume that this model tell us that
\[D_{10} \sim N(\mu,\sigma^2)\]
However, foresters are typically interested in timber volume because this is what is effectively sold. Therefore, we might ask what is the expected volume of these trees 10 years from now (\(V_{10}\)). Say our allometric equation (the equation that relates diameter to volume) is given by \(V=0.1 \times D^3\). Most people would assume that:
\[E[V_{10}]=0.1 \times (E[D_{10} ])^3=0.1 \times \mu^3\]
which turns out to be a very rough approximation to the true answer. To get the correct answer, we would have to calculate one of the following integrals:
\[E[V_{10}]=\int [0.1 \times D_{10}^3] N(D_{10}|\mu,\sigma^2) dD_{10}\]
or
\[E[V_{10}]=\int x f_{V10}(x) dx\]
where \(f_{V10}\) is the pdf for future volume \(V_{10}\). There are a couple of problems with these expressions. For the first one, perhaps we do not know how to calculate this integral. For the second one, we might not even know what the distribution for \(V_{10}\) (i.e., \(f_{V10}\)) is.
One way of solving this problem is to use simulation-based inference. More specifically, we can generate multiple samples from \(D_{10}\) by sampling from \(N(\mu,\sigma^2)\) and, for each sample, calculate \(V_{10}\). This gives us the distribution of \(V_{10}\) through simulation, not the analytical expressions shown above. As a result, we can easily make inference on \(V_{10}\). For instance, if we want to calculate \(E[V_{10}]\), then we just use our Monte Carlo integration method illustrated previously. Here are some figures to illustrate this process assuming that \(\mu=60\) and \(\sigma^2=225\).
set.seed(1)
#generate random draws of d10
=60
mu1=15
sd1=0.1
k=10000
n=rnorm(n,mean=mu1,sd=sd1)
d10
par(mfrow=c(1,2),mar=c(2,2,3,1))
#plot the distribution of d10
plot(density(d10),lwd=3,xlab='',ylab='',main='D10')
abline(v=mu1)
#calculate v10 based on the random draws of d10
=0.1*(d10^3)
v10=density(v10)
z
#here are the answers
=mean(v10);
mc.answer=0.1*(mu1^3)
naive.answer=function(x){
vol0.1*(x^3)*dnorm(x,mu1,sd1)
}=integrate(vol,lower=-Inf,upper=Inf)
determ.answer determ.answer
## 25650 with absolute error < 2.5
mc.answer
## [1] 25629.56
naive.answer
## [1] 21600
#plot the distribution of v10
plot(z,lwd=3,xlab='',ylab='',main='V10')
This figure illustrates that, despite the fact that the distribution of \(D_{10}\) is normal (left panel), the distribution of \(V_{10}\) is not normal (right panel). Furthermore, the answer using deterministic integration (determ.answer) was equal to 25,650 while the Monte Carlo answer (mc.answer) was 25,630. In contrast, the naive answer by plugging in \(\mu\) was way off, equal to 21,600. This is due to a phenomenon called Jensen’s inequality. You can read more about this in (Ruel and Ayres 1999) and (Valle 2011).
Logistic regression example
The example above relied on figuring out the distribution of a transformation of a single parameter. Another problem arises when we want to calculate a function of two or more parameters. For example, in a logistic regression model, we are interested in modeling \(\pi_i\), the probability that a particular event happens (e.g., the tree of a particular species is present or a person survives). Recall that a logistic regression model with a single covariate is given by:
\[y_i \sim Bernoulli(\pi_i)\] \[\pi_i=\frac{exp(\beta_0+\beta_1 x_i)}{1+exp(\beta_0+\beta_1 x_i)}\]
We want to show how the success probability \(\pi_i\) varies as a function of our covariate \(x_i\) and, at the same time, show the 95% interval for this relationship. Let \(D\) stand for the dataset used to fit the model. Even if we knew that the posterior distribution for the regression parameters were \(p(\beta_0|D)=N(\beta_0|-0.5,0.25)\) and \(p(\beta_1|D)=N(\beta_1|0.5,0.25)\), it would still be hard to figure out analytically what the distribution for \(\pi_i=\frac{exp(\beta_0+\beta_1 x_i)}{1+exp(\beta_0+\beta_1 x_i)}\) ought to be. Nevertheless, we can do this through Monte Carlo integration.
Say that we have samples of \(\beta_0\) and \(\beta_1\) from the posterior distribution and we want to estimate \(\pi_i\) when our covariate is equal to 5 (i.e., \(x_i=5\)). Here is what we find using the file param1.csv:
setwd('U:/uf/courses/bayesian course/2016/chapter4/transf1')
=read.csv('param1.csv',as.is=T)
p1
apply(p1,2,mean)
## b0 b1
## -0.5 0.5
#based on average parameter values
par(mfrow=c(1,3),mar=c(4,4,4,1))
plot(density(p1$b0),type='l',main='b0',xlab='b0')
plot(density(p1$b1),type='l',main='b1',xlab='b1')
=exp(p1$b0+p1$b1*5)
tmp=tmp/(1+tmp)
fimplot(density(fim,from=0,to=1),type='l',ylim=c(0,2.8),xlab='pi',
main='Implied distrib. for pi')
=exp(-0.5+0.5*5)
tmp=tmp/(1+tmp)
naive.res=mean(fim)
sim.resabline(v=naive.res,col='red',lty=2)
abline(v=sim.res,lty=2)
This figure shows that the distribution of the success probability \(\pi_i\) when \(x_i=5\) is really weird (right panel), nothing like the symmetric distribution of \(\beta_0\) and \(\beta_1\) (left and middle panels). Furthermore, it also shows that \(E[\pi_i|x_i=5,y_1,...,y_n ]\approx 0.74\) (dashed black line), which is very different from our naive estimate \(\frac{exp(E[\beta_0]+E[\beta_1]x_i)}{1+exp(E[\beta_0]+E[\beta_1]x_i)}=0.88\) (dashed red line)!!!
Furthermore, we often times want to report the estimated relationship between \(\pi_i\) and our covariate \(x_i\). This can be done by performing similar calculations for different values of \(x_i\):
#based on average parameter values
=apply(p1,2,mean)
media=seq(from=0,to=5,length.out=1000)
x=exp(media[1]+media[2]*x)
tmp
par(mfrow=c(1,1),mar=c(4,4,1,1))
plot(x,tmp/(1+tmp),type='l',ylim=c(0,1),col='red',ylab=expression(pi[i]),
xlab=expression(x[i]))
=matrix(NA,length(x),3)
res1for (i in 1:length(x)){
=exp(p1$b0+p1$b1*x[i])
tmp=tmp/(1+tmp)
tmp1=quantile(tmp1,c(0.025,0.975))
qtt=c(mean(tmp1),qtt)
res1[i,]
}lines(x,res1[,1])
lines(x,res1[,2],lty=3)
lines(x,res1[,3],lty=3)
Notice how our naive idea of the relationship between \(x_i\) and \(\pi_i\) (red line) can be very different than the actual relationship when we take into account parameter uncertainty (black line). Furthermore, using the random draws from \(\beta_0\) and \(\beta_1\), we can also depict 95% credible point-wise envelopes (dashed black lines).
To summarize, a key advantage of making inference through simulations is that inference is easy to do on transformations / functions of one or multiple parameters.
Comments?
Send me an email at