Theory suggests that it is important for us to account for temporal correlation so that we are not over-confident in relation to our parameter estimates. Let’s see what that means in practice by comparing the results from the AR-1 model to those from a simple linear regression.
To do this, I start by re-formatting the simulated data so that I can fit a simple linear regression:
rm(list=ls(all=TRUE))
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\AR1 model')
#fit regular lm model
=read.csv('fake data.csv',as.is=T)[,-1];
dat=read.csv('fake covariates.csv',as.is=T);
cov1colnames(cov1)=paste('x',1:ncol(cov1),sep='')
#reformat these data
=numeric()
dat1for (i in 1:ncol(dat)){
=cbind(dat[,i],cov1)
tmp=rbind(dat1,tmp)
dat1
}colnames(dat1)[1]='y'
=lm(y~.,data=dat1) mod
Next, I summarize and combine the results from the JAGS model to those from the lm function:
#get true parameter values
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\AR1 model')
=read.csv('fake betas.csv',as.is=T)
true.betas
#get jags parameter estimates
=read.csv('jags ar1 results.csv',as.is=T)
jagscolnames(jags)=paste(colnames(jags),'.jags',sep='')
=apply(jags,2,quantile,c(0.025,0.5,0.975))
jags1=t(jags1)
jags2colnames(jags2)=paste(c('lo','med','hi'),'.jags',sep='')
#get parameter estimates and corresponding 95% CI
=summary(mod)
mod1=cbind(mod1$coefficients[,'Estimate'],
lm.betas$coefficients[,'Estimate']+1.96*mod1$coefficients[,'Std. Error'],
mod1$coefficients[,'Estimate']-1.96*mod1$coefficients[,'Std. Error'])
mod1colnames(lm.betas)=c('med.lm','hi.lm','lo.lm')
#combine everything
=cbind(true.betas,jags2,lm.betas)
fimcolnames(fim)[1]='true.betas'
Now let’s compare the width of the corresponding 95% confidence/credible intervals. In this comparison, I would expect that the model that does not account for temporal correlation would have a narrower interval, which is exactly what we see:
#compare length of 95% CI: much higher uncertainty in Bayes model parameter estimates
=cbind(fim[,'hi.jags']-fim[,'lo.jags'],fim[,'hi.lm']-fim[,'lo.lm'])
cisboxplot(cis,names=c('AR-1','lm'),ylab='Width of 95% CI')
Having a narrower confidence interval is not bad on itself. However, it can be bad if it fails to encompass the true parameters and leads to incorrect inference (e.g., spuriously identifying a significant relationship). The calculations below show that the 95% CI’s of the AR-1 model encompass all the true regression parameters whereas those from the simple linear regression model only do so 60% of the times. Furthermore, the simple linear regression model incorrectly detects a significant negative association between covariate 2 and the response variable.
#compare empirical coverage
=fim[,'lo.jags'] < fim[,'true.betas'] & fim[,'hi.jags'] > fim[,'true.betas']
cond.jagsmean(cond.jags)
## [1] 1
=fim[,'lo.lm'] < fim[,'true.betas'] & fim[,'hi.lm'] > fim[,'true.betas']
cond.lmmean(cond.lm)
## [1] 0.6
#sometimes we might detect significant relationships that don't really exist
#(look at slope of second covariate)
c('true.betas','lo.lm','hi.lm')] fim[,
## true.betas lo.lm hi.lm
## betas1.jags -0.2384678 -0.4441840 -0.255364613
## betas2.jags 0.0000000 -0.1982882 -0.028017940
## betas3.jags -1.0213280 -1.1112345 -0.875414624
## betas4.jags 0.0000000 -0.1783890 0.002790004
## betas5.jags -2.3721143 -2.5496210 -2.361670124
Comments?
Send me an email at