Checking model fit with the coverage of the 95% predictive credible interval
There are multiple ways of assessing model fit. For example, we could assess the coverage of our 95% predictive credible interval. This interval takes into account parameter and sampling uncertainty and should encompass most (ideally 95%) of our original data. Indeed, our earlier figure reveals that 7% of our observations (3/42) where outside this interval, suggesting that our model performs well under this criterion.
Assessing model fit using the predictive distribution
Another approach to assess model fit consists of using the predictive distribution. This approach allows us to compare features of the original data set with features of the data sets that would be created if our model was right. If the feature based on the original dataset is very different from the features based on the simulated dataset, that would suggest that the model we created does not represent well the data generating mechanism and therefore our model does not fit the data well.
These comparisons can focus on many different aspects of the data. In some cases, a model might perform well in adequately representing a particular feature of the original dataset but may fail to represent well another feature. It is up to the modeler to judge if the identified shortcomings are severe enough to require modifications of the original assumptions.
To illustrate this idea, notice that the approach describe above fails to detect some important shortcomings of this model (i.e., our model predicts a large number of negative observations!). To confirm if this is truly a shortcoming of this model, I will create new data sets and for each new data set, I will calculate the proportion of observations that would be negative. Given that the real dataset has no negative nitrate concentration (thankfully!), we want our model to rarely make negative predictions. Unfortunately, the proportion of negative nitrate concentrations varies from 0.1 to 0.4, confirming that this is an important flaw in our model.
#generate "new datasets", one for each draw of the posterior distribution
=nrow(dat)
ntot=rep(NA,nrow(param))
resfor (i in 1:nrow(param)){
=as.numeric(param[i,])
param1=rnorm(ntot,mean=param1[1]+param1[2]*dat$cropland,sd=sqrt(param1[3]))
dat.new
#calculate the proportion of observations in the new dataset that are negative
=mean(dat.new<0)
res[i]
}
#plot results
hist(res,main='Proportion of negative observations',xlab='')
abline(v=0,col='red')
Summary
The take-home message here is that it is important to compare the predictive distribution (i.e., the data sets that would be generated if your model was the true underlying data generating model) to the observed data and check for discrepancies. Some measures of discrepancy might be better than others at detecting particular type of differences. Ultimately, it will depend on the modeler to judge if the detected discrepancies are acceptable or how the model can be improved to reduce these problems.
On a side note, clearly we could have plotted residuals to try to detect some of the problems mentioned above. This would probably have been simpler and would not require us to generate replicated datasets and all that fancy stuff. However, there are many situations (e.g., when modeling binary data) in which residuals are not all that useful. In summary, the predictive distribution approach is a very general method to evaluate model fit, being applicable to multiple types of models.
You can read more on checking model assumptions through the use of the predictive distribution in pages 158-165 (Gelman and Hill 2007). Additional ideas regarding how to check model fit for Bayesian models can be found in (Conn et al. 2018).
Comments?
Send me an email at