Steps to calculating Bayesian p-values

Bayesian p-values (BP) can be useful as an absolute metric (in the sense that we do not have to have alternative models for comparison) of model fit. To calculate BP, we have to perform the following steps:

  1. We have to come up with a test statistic. Often times people think about mean squared error (i.e., \(\frac{1}{n} \sum_{i=1}^{n} (\hat{y_i} - y_i)^2)\)) but this is a good metric only for Gaussian regression models. A metric that works for a wider range of models is the sum over all observations of the log-likelihood.

  2. For each iteration of our MCMC, we calculate the test statistic for the observed data and for the synthetic/simulated data.

  3. We compare how often the test statistic for the observed data is greater than the corresponding quantity for the synthetic/simulated data.

Values close to 0 or to 1 indicate that the test statistic for the synthetic/simulated data is very different from the test statistic for the observed data, suggesting that the model did not fit well the data. In essence, this is a predictive model check (as described in an earlier module) but with a very particular test statistic.

Notice that, although it is called a p-value, we have to be careful not to interpret it in the same way that we would a frequentist p-value. More specifically, values close to 0 AND values close to 1 indicate lack of fit.

Testing Bayesian p-values

Simulating data

To demonstrate the utility of Bayesian p-values, I will start by simulating data from a Poisson regression model and from a Negative Binomial regression model. Subsequently, I will fit a Poisson regression model to each one of these datasets.

I expect that I will get a Bayesian p-value that is far from 0 and 1 for the data simulated using a Poisson regression model because the model that we are fitting is the same as the model used to generate the data. On the other hand, I expect that the Bayesian p-value will be close to 0 or 1 for the data simulated using a Negative Binomial regression model because the model that is being fitted is different from the model used to generate the data.

rm(list=ls(all=TRUE)) 
set.seed(1)

#basic settings
nobs=1000
b0=1
b1=0.1
logA=runif(nobs,-5,5)
mu=exp(b0+b1*logA)

#simulate data following Poisson regression
S=rpois(nobs,mu)
dat=data.frame(S=S,logA=logA)

#plot(S~logA,data=dat)

#export Poisson data
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\waic and Bayes pval')
write.csv(dat,'sim data Poisson.csv',row.names=F)

#simulate data following NB regression
r=1
p=r/(r+mu)
S=rnbinom(nobs,size=r,prob=p)
dat=data.frame(S=S,logA=logA)
#plot(S~logA,data=dat)

#export NB data
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\waic and Bayes pval')
write.csv(dat,'sim data NB.csv',row.names=F)

JAGS code

The key changes in my JAGS code is that:

  1. I will be calculating the log of the likelihood of each observation in my dataset

  2. I will be simulating a new dataset and, for each one of these simulated observations, I will calculate the corresponding log likelihood.

My JAGS code is provided below.

model{
  #likelihood
  for (i in 1:nobs){
    #mean of Poisson regression
    mu[i]=exp(b0+b1*logA[i])
    S[i]~dpois(mu[i])
    
    #calculate log-likelihood for observed data
    Tobs[i]=logdensity.pois(S[i],mu[i])
    
    #simulate data
    Ssim[i]~dpois(mu[i])
    
    #calculate log-likelihood for simulated data
    Tsim[i]=logdensity.pois(Ssim[i],mu[i])
  }

  #priors
  b0~dnorm(0,0.1)
  b1~dnorm(0,0.1)
}

Fitting my model for the data generated using a Poisson regression model

The code to fit the JAGS model is pretty standard. The only novelty is that I will be monitoring the parameters of my regression model AND the calculated log likelihood for the observed and simulated datasets.

rm(list=ls(all=TRUE)) 
library(jagsUI)
set.seed(1)

#get data
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\waic and Bayes pval')
dat=read.csv('sim data Poisson.csv')
#dat=read.csv('sim data NB.csv')
nobs=nrow(dat)
#plot(S~logA,data=dat)

# MCMC settings 
ni <- 1500 #number of iterations
nt <- 10    #interval to thin 
nb <- 500  #number of iterations to discard as burn-in
nc <- 3     #number of chains

#set parameters to monitor
params=c("b0","b1",'Tsim','Tobs')

#create function that sets the initial values
inits=function(){
  list(b0=rnorm(1,-1,1), 
       b1=rnorm(1,-1,1))
}

#prepare data for JAGS
dat1=list(S=dat$S,logA=dat$logA,
          nobs=nobs)

#run NB model
setwd('U:\\uf\\courses\\bayesian course\\rmarkdown\\waic and Bayes pval')
mod=jags(model.file="Poisson regression.R", inits=inits,
         parameters.to.save=params,
         data=dat1,n.chains=nc,
         n.burnin=nb,n.iter=ni,n.thin=nt,DIC=TRUE)

Calculating the Bayesian p-value

Notice that JAGS will output the matrices Tobs and Tsim. These matrices contain the log-likelihood for each observation and for each retained iteration of our MCMC algorithm. As a result, the number of rows is the number of retained iterations and the number of columns is the number of observations in our dataset.

We calculate the sum of the log-likelihood by summing across the rows of the Tsim and Tobs matrices. This yields a vector containing the number of retained iterations. Finally, using these vectors (calculated both for Tsim and Tobs), I can calculate by Bayesian p-value.

#check the dimension of the Tobs and Tsim matrices
dim(mod$sims.list$Tobs)
## [1]  300 1000
dim(mod$sims.list$Tsim)
## [1]  300 1000
#calculate test statistic (i.e., sum of loglikelihood)
TobsSum=rowSums(mod$sims.list$Tobs)
TsimSum=rowSums(mod$sims.list$Tsim)

#calculate Bayesian p-value
mean(TsimSum>TobsSum)
## [1] 0.8633333

Ultimately, I find a Bayesian p-value of 0.86, which is not too close to 0 or 1 for me to be worried. In contrast, if I modify the code above so that the input data is “sim data NB.csv”, the Bayesian p-value is equal to 1, clearly indicating that something is wrong with our model. Try it out!



Comments?

Send me an email at

References