Why bother modeling temporal correlation? Comparing results from the AR-1 model and simple linear regression

Theory suggests that it is important for us to account for temporal correlation so that we are not over-confident in relation to our parameter estimates. Let’s see what that means in practice by comparing the results from the AR-1 model to those from a simple linear regression.

To do this, I start by re-formatting the simulated data so that I can fit a simple linear regression:

rm(list=ls(all=TRUE)) 
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\AR1 model')

#fit regular lm model
dat=read.csv('fake data.csv',as.is=T)[,-1]; 
cov1=read.csv('fake covariates.csv',as.is=T); 
colnames(cov1)=paste('x',1:ncol(cov1),sep='')

#reformat these data
dat1=numeric()
for (i in 1:ncol(dat)){
  tmp=cbind(dat[,i],cov1)
  dat1=rbind(dat1,tmp)
}
colnames(dat1)[1]='y'
mod=lm(y~.,data=dat1)

Next, I summarize and combine the results from the JAGS model to those from the lm function:

#get true parameter values
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\AR1 model')
true.betas=read.csv('fake betas.csv',as.is=T)

#get jags parameter estimates
jags=read.csv('jags ar1 results.csv',as.is=T)
colnames(jags)=paste(colnames(jags),'.jags',sep='')
jags1=apply(jags,2,quantile,c(0.025,0.5,0.975))
jags2=t(jags1)
colnames(jags2)=paste(c('lo','med','hi'),'.jags',sep='')

#get parameter estimates and corresponding 95% CI
mod1=summary(mod)
lm.betas=cbind(mod1$coefficients[,'Estimate'],
               mod1$coefficients[,'Estimate']+1.96*mod1$coefficients[,'Std. Error'],
               mod1$coefficients[,'Estimate']-1.96*mod1$coefficients[,'Std. Error'])
colnames(lm.betas)=c('med.lm','hi.lm','lo.lm')

#combine everything
fim=cbind(true.betas,jags2,lm.betas)
colnames(fim)[1]='true.betas'

Now let’s compare the width of the corresponding 95% confidence/credible intervals. In this comparison, I would expect that the model that does not account for temporal correlation would have a narrower interval, which is exactly what we see:

#compare length of 95% CI: much higher uncertainty in Bayes model parameter estimates
cis=cbind(fim[,'hi.jags']-fim[,'lo.jags'],fim[,'hi.lm']-fim[,'lo.lm'])
boxplot(cis,names=c('AR-1','lm'),ylab='Width of 95% CI')

Having a narrower confidence interval is not bad on itself. However, it can be bad if it fails to encompass the true parameters and leads to incorrect inference (e.g., spuriously identifying a significant relationship). The calculations below show that the 95% CI’s of the AR-1 model encompass all the true regression parameters whereas those from the simple linear regression model only do so 60% of the times. Furthermore, the simple linear regression model incorrectly detects a significant negative association between covariate 2 and the response variable.

#compare empirical coverage
cond.jags=fim[,'lo.jags'] < fim[,'true.betas'] & fim[,'hi.jags'] > fim[,'true.betas']
mean(cond.jags)

## [1] 1

cond.lm=fim[,'lo.lm'] < fim[,'true.betas'] & fim[,'hi.lm'] > fim[,'true.betas']
mean(cond.lm)

## [1] 0.6

#sometimes we might detect significant relationships that don't really exist 
#(look at slope of second covariate)
fim[,c('true.betas','lo.lm','hi.lm')]

##             true.betas      lo.lm        hi.lm
## betas1.jags -0.2384678 -0.4441840 -0.255364613
## betas2.jags  0.0000000 -0.1982882 -0.028017940
## betas3.jags -1.0213280 -1.1112345 -0.875414624
## betas4.jags  0.0000000 -0.1783890  0.002790004
## betas5.jags -2.3721143 -2.5496210 -2.361670124

Comments?

Send me an email at