Why NIMBLE?
So far, we have mostly relied on JAGS to specify and fit our Bayesian models. However, there are other tools out there that can often outperform JAGS without requiring users to code their own customized samplers from scratch. A prominent one is called NIMBLE, which stands for “Numerical Inference for statistical Models for Bayesian and Likelihood Estimation”.
Similar to JAGS, NIMBLE allows users to specify their Bayesian models using the BUGS language. The main benefit of using NIMBLE relative to JAGS is that it allows users to choose which samplers they want to use for each parameter or set of parameters. While this might not seem that useful at first, depending on the complexity of the model that you are trying to fit, having this additional flexibility can be a life saver.
To install NIMBLE for the first time, you will first have to install “Rtools.exe” as detailed here (https://r-nimble.org/download). Then, you can install NIMBLE as a regular R package using
install.packages('nimble')
Simulated data
To illustrate the benefit of using NIMBLE, I will rely on a simulated dataset “sim data corr covs.csv” in which the two covariates \(x_1\) and \(x_2\) that I would like to use are strongly correlated. This is the type of situation in MCMC algorithms can become very sticky if they are based on sampling one parameter at a time.
rm(list=ls())
library('nimble')
library('MCMCvis')
library('coda')
setwd('U:\\uf\\courses\\bayesian course\\NIMBLE')
=read.csv('sim data corr covs.csv')
dathead(dat)
## y x1 x2
## 1 2 -0.5145146 -1.2998346
## 2 9 1.6535542 1.4760946
## 3 3 0.4437921 0.3108793
## 4 4 0.5837254 0.5370379
## 5 2 0.5194115 0.5770463
## 6 4 1.0695620 0.6657509
cor(dat[,c('x1','x2')])
## x1 x2
## x1 1.0000000 0.9001235
## x2 0.9001235 1.0000000
Statistical model
We will assume that the response variable \(y_i\) comes from a Poisson distribution and that \(log(\lambda)\) depends linearly on covariates \(x_{1i}\) and \(x_{2i}\), as shown below:
\[y_i \sim Poisson(exp(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))\] We also assume vague Gaussian priors for each regression parameter:
\[\beta_0,\beta_1,\beta_2 \sim N(0,100)\]
Here is how we specify this model in NIMBLE. Notice that this looks very similar to what we would do in JAGS except that the normal distribution is specified by the mean and variance parameters in NIMBLE.
#define model
<- nimbleCode({
model for (i in 1:nobs){
~ dpois(exp(b0+b1*x1[i]+b2*x2[i]))
y[i]
}~ dnorm(0, 100)
b0 ~ dnorm(0, 100)
b1 ~ dnorm(0, 100)
b2 })
Initial setup
As with JAGS, we need to pass to NIMBLE our data. However, we will make a distinction between data (containing variables that result from stochastic processes such as \(y_i\)) and constants (all covariates and the overall number of observations). We will also specify how the initial values are to be generated, the number of iterations, number of burn-in iterations, and number of chains.
#nimble stuff
<- list(x1 = dat$x1, x2 = dat$x2, nobs=nrow(dat))
my.constants = list(y = dat$y)
my.data <- c("b0",'b1','b2')
parameters.to.save
#initial values
<- function(){
initial.values list(b0=rnorm(1,sd=1),
b1=rnorm(1,sd=1),
b2=rnorm(1,sd=1))
}
<- 5000
n.iter <- 1000
n.burnin <- 3 n.chains
NIMBLE with univariate random walk Metropolis-Hastings samplers
To run NIMBLE, we first have to create an uncompiled model. Then, we have to compile this model and get the MCMC configuration. This is where we can change the samplers that are used. After that, we create an MCMC function, compile the model, and run the resulting MCMC algorithm. This sounds complicated but is not too bad.
The default for NIMBLE is to use a univariate random walk Metropolis-Hastings sampler for each parameter in the model. So let’s see how well this works.
#Standard RW approach
#Step 1) Create uncompiled model
<- nimbleModel(code = model,
mod1 constants = my.constants,
data = my.data)
#Step 2) compile model
=compileNimble(mod1) mod2
#Step 3) MCMC configuration
=configureMCMC(mod2,
mod3monitors = parameters.to.save)
## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
## - b0
## - b1
## - b2
#Step 4) Create MCMC function and compile it
<- buildMCMC(mod3)
mod4 <- compileNimble(mod4)
mod5
#Step 5) Run MCMC
<- runMCMC(mcmc=mod5,
samples inits = initial.values,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains)
#Step 6) Examine output
MCMCtrace(object = samples,
pdf = FALSE, # no export to PDF
ind = TRUE, # separate density lines per chain
params = c('b0','b1','b2'))
#Calculate effective sample size
=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]);
b0effectiveSize(b0); length(b0)
## var1
## 1376.785
## [1] 12000
=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]);
b1effectiveSize(b1);
## var1
## 214.7053
=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]);
b2effectiveSize(b2);
## var1
## 209.3439
Despite creating 12,000 posterior samples for \(\beta_1\) and \(\beta_2\), the effective sample size was just 240-250. These results suggest that this particular algorithm is not working great. The trace plots show that the individual chains are not mixing very well and the resulting density estimates are relatively bumpy.
NIMBLE with univariate slice samplers
JAGS samples each parameter in the model using univariate slice samplers. Because NIMBLE gives us flexibility to choose which sampler we want, let’s see if we can improve on the dismal performance above by changing from the univariate random walk MH sampler to a slice sampler. To do this, we will first have to remove the current samplers associated with these parameters and then add the slice sampler to them.
#Slice sampler approach
#Step 3) MCMC configuration
=configureMCMC(mod2,
mod3monitors = parameters.to.save)
## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
## - b0
## - b1
## - b2
$removeSamplers(c('b0','b1','b2'))
mod3$printSamplers()
mod3$addSampler(target = c('b0','b1','b2'),
mod3type = 'slice',
expandTarget=T)
#Step 4) Create MCMC function and compile it
<- buildMCMC(mod3)
mod4 <- compileNimble(mod4)
mod5
#Step 5) Run MCMC
<- runMCMC(mcmc=mod5,
samples inits = initial.values,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains)
#Step 6) Examine output
MCMCtrace(object = samples,
pdf = FALSE, # no export to PDF
ind = TRUE, # separate density lines per chain
params = c('b0','b1','b2'))
#Calculate effective sample size
=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]);
b0effectiveSize(b0);
## var1
## 5002.012
=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]);
b1effectiveSize(b1);
## var1
## 903.2521
=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]);
b2effectiveSize(b2);
## var1
## 876.7463
You can see here that each chain seems to take longer to run with the slice sampler. However, the effective sample sizes for \(\beta_1\) and \(\beta_2\) were >3x larger. Not bad!
If you are curious to see which samplers are currently available within NIMBLE, you can type the code below.
?samplers
NIMBLE with the Automated Factor (AF) slice sampler
It turns out that we can improve the performance of our model even more if we sample our parameters jointly, instead of one at a time. There are multiple ways of doing this but I was blown away when I saw the performance of the Automated Factor (AF) slice sampler.
#Step 3) MCMC configuration
=configureMCMC(mod2,
mod3monitors = parameters.to.save)
## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
## - b0
## - b1
## - b2
$removeSamplers(c('b0','b1','b2'))
mod3$printSamplers()
mod3$addSampler(target = c('b0','b1','b2'),
mod3type = 'sampler_AF_slice')
#Step 4) Create MCMC function and compile it
<- buildMCMC(mod3)
mod4 <- compileNimble(mod4)
mod5
#Step 5) Run MCMC
<- runMCMC(mcmc=mod5,
samples inits = initial.values,
niter = n.iter,
nburnin = n.burnin,
nchains = n.chains)
#Step 6) Examine output
MCMCtrace(object = samples,
pdf = FALSE, # no export to PDF
ind = TRUE, # separate density lines per chain
params = c('b0','b1','b2'))
#Calculate effective sample size
=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]);
b0effectiveSize(b0);
## var1
## 12000
=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]);
b1effectiveSize(b1);
## var1
## 11666.14
=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]);
b2effectiveSize(b2);
## var1
## 11677.83
Although the run time seems similar to that of the regular slice sampler, the AF slice sampler works brilliantly in this example.
Comments?
Send me an email at