Improving performance using NIMBLE

Why NIMBLE?

So far, we have mostly relied on JAGS to specify and fit our Bayesian models. However, there are other tools out there that can often outperform JAGS without requiring users to code their own customized samplers from scratch. A prominent one is called NIMBLE, which stands for “Numerical Inference for statistical Models for Bayesian and Likelihood Estimation”.

Similar to JAGS, NIMBLE allows users to specify their Bayesian models using the BUGS language. The main benefit of using NIMBLE relative to JAGS is that it allows users to choose which samplers they want to use for each parameter or set of parameters. While this might not seem that useful at first, depending on the complexity of the model that you are trying to fit, having this additional flexibility can be a life saver.

To install NIMBLE for the first time, you will first have to install “Rtools.exe” as detailed here (https://r-nimble.org/download). Then, you can install NIMBLE as a regular R package using

install.packages('nimble')

Simulated data

To illustrate the benefit of using NIMBLE, I will rely on a simulated dataset “sim data corr covs.csv” in which the two covariates \(x_1\) and \(x_2\) that I would like to use are strongly correlated. This is the type of situation in MCMC algorithms can become very sticky if they are based on sampling one parameter at a time.

rm(list=ls())
library('nimble')
library('MCMCvis')
library('coda')

setwd('U:\\uf\\courses\\bayesian course\\NIMBLE')
dat=read.csv('sim data corr covs.csv')
head(dat)

##   y         x1         x2
## 1 2 -0.5145146 -1.2998346
## 2 9  1.6535542  1.4760946
## 3 3  0.4437921  0.3108793
## 4 4  0.5837254  0.5370379
## 5 2  0.5194115  0.5770463
## 6 4  1.0695620  0.6657509

cor(dat[,c('x1','x2')])

##           x1        x2
## x1 1.0000000 0.9001235
## x2 0.9001235 1.0000000

Statistical model

We will assume that the response variable \(y_i\) comes from a Poisson distribution and that \(log(\lambda)\) depends linearly on covariates \(x_{1i}\) and \(x_{2i}\), as shown below:

\[y_i \sim Poisson(exp(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))\] We also assume vague Gaussian priors for each regression parameter:

\[\beta_0,\beta_1,\beta_2 \sim N(0,100)\]

Here is how we specify this model in NIMBLE. Notice that this looks very similar to what we would do in JAGS except that the normal distribution is specified by the mean and variance parameters in NIMBLE.

#define model
model <- nimbleCode({
  for (i in 1:nobs){
    y[i] ~ dpois(exp(b0+b1*x1[i]+b2*x2[i]))  
  }
  b0 ~ dnorm(0, 100)
  b1 ~ dnorm(0, 100)
  b2 ~ dnorm(0, 100)
})

Initial setup

As with JAGS, we need to pass to NIMBLE our data. However, we will make a distinction between data (containing variables that result from stochastic processes such as \(y_i\)) and constants (all covariates and the overall number of observations). We will also specify how the initial values are to be generated, the number of iterations, number of burn-in iterations, and number of chains.

#nimble stuff
my.constants <- list(x1 = dat$x1, x2 = dat$x2, nobs=nrow(dat))
my.data = list(y = dat$y)
parameters.to.save <- c("b0",'b1','b2')

#initial values
initial.values <- function(){
  list(b0=rnorm(1,sd=1),
       b1=rnorm(1,sd=1),
       b2=rnorm(1,sd=1))
}

n.iter <- 5000
n.burnin <- 1000
n.chains <- 3

NIMBLE with univariate random walk Metropolis-Hastings samplers

To run NIMBLE, we first have to create an uncompiled model. Then, we have to compile this model and get the MCMC configuration. This is where we can change the samplers that are used. After that, we create an MCMC function, compile the model, and run the resulting MCMC algorithm. This sounds complicated but is not too bad.

The default for NIMBLE is to use a univariate random walk Metropolis-Hastings sampler for each parameter in the model. So let’s see how well this works.

#Standard RW approach 

#Step 1) Create uncompiled model
mod1 <- nimbleModel(code = model,
                    constants = my.constants,
                    data = my.data)

#Step 2) compile model
mod2=compileNimble(mod1)

#Step 3) MCMC configuration
mod3=configureMCMC(mod2,
                   monitors = parameters.to.save)

## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
##   - b0
##   - b1
##   - b2

#Step 4) Create MCMC function and compile it
mod4 <- buildMCMC(mod3)
mod5 <- compileNimble(mod4)

#Step 5) Run MCMC
samples <- runMCMC(mcmc=mod5,
                   inits = initial.values,
                   niter = n.iter,
                   nburnin = n.burnin,
                   nchains = n.chains)

#Step 6) Examine output
MCMCtrace(object = samples,
          pdf = FALSE, # no export to PDF
          ind = TRUE, # separate density lines per chain
          params = c('b0','b1','b2'))

#Calculate effective sample size
b0=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]); 
effectiveSize(b0); length(b0)

##     var1 
## 1376.785

## [1] 12000

b1=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]); 
effectiveSize(b1);

##     var1 
## 214.7053

b2=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]); 
effectiveSize(b2);

##     var1 
## 209.3439

Despite creating 12,000 posterior samples for \(\beta_1\) and \(\beta_2\), the effective sample size was just 240-250. These results suggest that this particular algorithm is not working great. The trace plots show that the individual chains are not mixing very well and the resulting density estimates are relatively bumpy.

NIMBLE with univariate slice samplers

JAGS samples each parameter in the model using univariate slice samplers. Because NIMBLE gives us flexibility to choose which sampler we want, let’s see if we can improve on the dismal performance above by changing from the univariate random walk MH sampler to a slice sampler. To do this, we will first have to remove the current samplers associated with these parameters and then add the slice sampler to them.

#Slice sampler approach 

#Step 3) MCMC configuration
mod3=configureMCMC(mod2,
                   monitors = parameters.to.save)

## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
##   - b0
##   - b1
##   - b2

mod3$removeSamplers(c('b0','b1','b2'))
mod3$printSamplers()
mod3$addSampler(target = c('b0','b1','b2'),
                type = 'slice',
                expandTarget=T)

#Step 4) Create MCMC function and compile it
mod4 <- buildMCMC(mod3)
mod5 <- compileNimble(mod4)

#Step 5) Run MCMC
samples <- runMCMC(mcmc=mod5,
                   inits = initial.values,
                   niter = n.iter,
                   nburnin = n.burnin,
                   nchains = n.chains)

#Step 6) Examine output
MCMCtrace(object = samples,
          pdf = FALSE, # no export to PDF
          ind = TRUE, # separate density lines per chain
          params = c('b0','b1','b2'))

#Calculate effective sample size
b0=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]); 
effectiveSize(b0);

##     var1 
## 5002.012

b1=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]); 
effectiveSize(b1);

##     var1 
## 903.2521

b2=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]); 
effectiveSize(b2);

##     var1 
## 876.7463

You can see here that each chain seems to take longer to run with the slice sampler. However, the effective sample sizes for \(\beta_1\) and \(\beta_2\) were >3x larger. Not bad!

If you are curious to see which samplers are currently available within NIMBLE, you can type the code below.

?samplers

NIMBLE with the Automated Factor (AF) slice sampler

It turns out that we can improve the performance of our model even more if we sample our parameters jointly, instead of one at a time. There are multiple ways of doing this but I was blown away when I saw the performance of the Automated Factor (AF) slice sampler.

#Step 3) MCMC configuration
mod3=configureMCMC(mod2,
                   monitors = parameters.to.save)

## ===== Monitors =====
## thin = 1: b0, b1, b2
## ===== Samplers =====
## RW sampler (3)
##   - b0
##   - b1
##   - b2

mod3$removeSamplers(c('b0','b1','b2'))
mod3$printSamplers()
mod3$addSampler(target = c('b0','b1','b2'),
                type = 'sampler_AF_slice')

#Step 4) Create MCMC function and compile it
mod4 <- buildMCMC(mod3)
mod5 <- compileNimble(mod4)

#Step 5) Run MCMC
samples <- runMCMC(mcmc=mod5,
                   inits = initial.values,
                   niter = n.iter,
                   nburnin = n.burnin,
                   nchains = n.chains)

#Step 6) Examine output
MCMCtrace(object = samples,
          pdf = FALSE, # no export to PDF
          ind = TRUE, # separate density lines per chain
          params = c('b0','b1','b2'))

#Calculate effective sample size
b0=c(samples$chain1[,1],samples$chain2[,1],samples$chain3[,1]); 
effectiveSize(b0);

##  var1 
## 12000

b1=c(samples$chain1[,2],samples$chain2[,2],samples$chain3[,2]); 
effectiveSize(b1);

##     var1 
## 11666.14

b2=c(samples$chain1[,3],samples$chain2[,3],samples$chain3[,3]); 
effectiveSize(b2);

##     var1 
## 11677.83

Although the run time seems similar to that of the regular slice sampler, the AF slice sampler works brilliantly in this example.

Comments?

Send me an email at