Data
Recall that we are interested in understanding the relationship between nitrate concentration (i.e., an important water pollutant) in Mississippi river basins as a function of percent row crops (Goolsby et al (1999) NOAA Coastal Ocean Program Decision Analysis Series # 1). Here are the data “river data.csv” and this is what these data look like:
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown')
=read.csv('river data.csv',as.is=T)
dathead(dat)
## basin.id cropland nitrate
## 1 1 2.5 0.647
## 2 2 1.3 1.062
## 3 3 14.3 1.432
## 4 4 0.5 0.579
## 5 5 45.6 3.561
## 6 6 46.6 3.938
plot(nitrate~cropland,data=dat)
Developing better models
We have shown that our original model, given by:
\[y_i \sim N(\beta_0 + \beta_1 x_i, \sigma^2)\]
generally did a good job with the 95% credible interval but unfortunately predicted a large proportion of rivers with negative nitrate concentrations.
There are multiple alternative model formulations that might be able to avoid the problem of predicting negative nitrate concentrations. Please come up with an alternative model formulation that is likely to avoid this problem.
Fit the model proposed in (1) using JAGS.
Using the predictive distribution, determine how well the new model fits these data. Is there any aspect of the data that your new model still fails to represent well? For example, does it estimate well the variability in the data when crop land is less than 20% and when it is greater than 20%?