censoring.knit

What is the likelihood in this model?

For this model, we will need to define the likelihood separately for the regular observations and the truncated/censored observations.

For the regular observations, we will use the following likelihood:

\[y_i \sim N(\beta_0 + \beta_1 x_i ,\sigma^2)\]

This is equivalent to stating that

\[p(y_i|\beta_0,\beta_1,\sigma^2,x_i )=N(y_i|\beta_0 + \beta_1 x_i,\sigma^2)\]

For the censored observations, we will have to define the likelihood by determining what is \(p(y_i=dl)\). This probability is given by:

\[p(y_i=dl|\beta_0,\beta_1,\sigma^2,x_i )=p(z_i<dl|\beta_0,\beta_1,\sigma^2,x_i )=\int_{-\infty}^{dl} N(z_i|\beta_0 + \beta_1 x_i ,\sigma^2) dz_i \] What does this integral represent?

If \(dl=1\), \(\beta_0 + \beta_1 x_i=2\), and \(\sigma^2=1.5\), then this integral is the area under the curve between \(-\infty\) and \(dl\) in the figure below:

Ultimately, the likelihood for the entire dataset will be comprised of two pieces: one for the regular observations and one for the censored observations. Let \(Uncensored\) and \(Censored\) denote the sets containing these observations (respectively). Then, the likelihood is given by:

\[=[\prod_{i \in Uncensored} N(y_i|\beta_0 +\beta_1 x_i ,\sigma^2)]\times[\prod_{i \in Censored} \int_{-\infty}^{dl}N(z_i|\beta_0 +\beta_1 x_i ,\sigma^2)dz_i ]\]

How will we get JAGS to calculate this integral for us?

You can evaluate this integral in R using the function “pnorm”:

dl=1
mean1=2
var1=1.5
pnorm(dl,mean1,sqrt(var1))

## [1] 0.2071081

In JAGS, you can also evaluate this integral using the function \(pnorm\).

How will we specify this likelihood in JAGS?

Until now, we have specified the likelihood in JAGS by using the dXXX functions. For example:

y[i]~ dnorm(mean1,1/sig2)

which is interpreted within JAGS as:

\[p(y_i|\mu_1,\sigma^2)=N(y_i|\mu_1,\sigma^2)\]

However, how do we implement a non-standard likelihood such as the one given by our integral? Ideally we would use the code below but JAGS will not understand this:

y[i] ~ pnorm(dl,mean1,1/sig2)

To implement this, we will rely on the so-called “ones” trick. The key idea is that we create a vector of ones (hence the name) and treat this vector as our “response” variable. For example, say \(k_i=1\) for the censored observations. We model this variable as:

\[k_i\sim Bern(\delta_i)\] where

\[\delta_i = p(z_i<dl)\]

Why does the “ones” trick work?

We can write the likelihood of this model as:

\[=[\prod_{i \in Uncensored} N(y_i|\beta_0 +\beta_1 x_i ,\sigma^2)]\times[\prod_{i \in Censored} Bern(k_i | \delta_i )]\] \[=[\prod_{i \in Uncensored} N(y_i|\beta_0 +\beta_1 x_i ,\sigma^2)]\times[\prod_{i \in Censored} \delta_i ^ {k_i} (1-\delta_i)^{1-k_i}]\] Because all \(k_i\) are equal to 1, this simplifies to:

\[=[\prod_{i \in Uncensored} N(y_i|\beta_0 +\beta_1 x_i ,\sigma^2)]\times[\prod_{i \in Censored} \delta_i]\] Because

\[\delta_i = p(z_i<dl)=\int_{-\infty}^{dl}N(z_i|\beta_0 +\beta_1 x_i ,\sigma^2)dz_i\],

our final expression for the likelihood is identical to the one we had originally specified:

\[=[\prod_{i \in Uncensored} N(y_i|\beta_0 +\beta_1 x_i ,\sigma^2)]\times[\prod_{i \in Censored} \int_{-\infty}^{dl}N(z_i|\beta_0 +\beta_1 x_i ,\sigma^2)dz_i ]\]

Analysis steps

Start by defining your likelihood in JAGS just for the uncensored observations
Include in your model from (1) an extra likelihood for the censored observations
Looking at the parameter estimates, is this model able to recover the true relationship between \(x\) and \(y\)?

Comments?

Send me an email at