Models with latent continuous responses for ordinal data
Ordinal data are relatively common but surprisingly few people are used to properly analyzing these type of data. Examples of ordered categorical variables include likert scale type of data (e.g., strongly disagree, disagree, indifferent, agree, strongly agree), letter grades (A, B, C, D, and E), and severity of pain (e.g., very severe, severe, moderate, and mild).
Often times, people simply code these data using integers (e.g., -2=strongly disagree, -1=disagree, 0=indifferent, 1=agree, 2=strongly agree) and use models such as ANOVA or normal linear regression, which assume that the response variable is a continuous random variable. The problems that arise from doing this are well documented in the literature (Liddell and Kruschke 2018).
One way to think about ordinal data is that these data arise from grouping an underlying continuous random variable. In some cases, this might be literally true. For example, letter grades often arise by categorizing a continuous measure of performance (A=86-100, B=76-86, C=66-76,…). Similarly, we can think about pain being a continuous variable but, because we are forced to choose one of these discrete severity categories, we essentially discretize how we feel about the pain that we are experiencing based on the \(t_1\) and \(t_2\) cutoffs.
Problem
Say we have a medication that is very effective for a particular disease but can have adverse effects on fetuses, depending on its dosage. The higher the dosage, the more effective the medication is but also the more likely it is that we will have adverse effects. To determine the optimal dosage, we conduct an toxicology experiment with pregnant mice. Say we obtain the following summarized data (reproduced from (Agresti 2002)):
Concentration | Nonlive | Malformation | Normal |
---|---|---|---|
0 | 15 | 1 | 281 |
62.5 | 17 | 0 | 225 |
125 | 22 | 7 | 283 |
250 | 38 | 59 | 202 |
500 | 144 | 132 | 9 |
We want to find out what is the maximum dose level for which the probability of malformation and nonlive are small, defined here as prob(malformation|dose) < 0.05 and prob(nonlive|dose) < 0.01.
Model structure
Similar to what we did for the censored data, we are going to assume that there is a latent continuous response \(z_i\) that describes how bad the adverse effects are and that this variable depends on dosage \(x_i\) through the following expression:
\[z_i \sim N(\beta_1 x_i ,1)\]
However, we do not directly observe \(z_i\) (after all, this is a latent variable). We only observe \(y_i\) such that:
\(y_i =\) “Non-live” if \(z_i < t_1\)
\(y_i =\) “Malformation” if \(t_1 < z_i < t_2\)
\(y_i =\) “Normal” if \(z_i > t_2\)
Here is a cartoon for this:
Notice that we do not have an intercept term \(\beta_0\) nor do we have a variance term \(\sigma^2\) in our regression. The reason for this is because these parameters are unidentifiable if we simultaneously want to estimate \(t_1\) and \(t_2\).
FCD’s for the cutoffs \(t_1\) and \(t_2\)
We will assume that \(t_1\) and \(t_2\) are uniformly distributed with the constrain that \(t_1\)<\(t_2\). Under this assumption, the FCD for \(t_1\) is given by:
\[t_1 \sim Unif(max(z_{nonlive}),min(z_{malformation}))\]
where
- \(max(z_{nonlive})\) is the maximum of all \(z_i\) for which \(y_i=\)“Non-live”; and
- \(min(z_{malformation})\) is the minimum of all \(z_i\) for which \(y_i=\)“Malformation”.
Similarly, the FCD for \(t_2\) is given by:
\[t_2 \sim Unif(max(z_{malformation}),min(z_{normal}))\]
where
- \(max(z_{malformation})\) is the maximum of all \(z_i\) for which \(y_i=\)“Malformation”; and
- \(min(z_{normal})\) is the minimum of all \(z_i\) for which \(y_i=\)“Normal”.
FCD for \(z_i\)
We will sample \(z_i\) from truncated normal distributions, where the truncation limits depend on the value of \(y_i\). More specifically, if \(y_i=\)“Non-live”, then:
\[z_i \sim N(\beta_1 x_i,1)I(z_i<t_1)\]
If \(y_i=\)“Malformation”, then:
\[z_i \sim N(\beta_1 x_i,1)I(t_1<z_i<t_2)\]
Finally, if \(y_i=\)“Normal”, then:
\[z_i \sim N(\beta_1 x_i,1)I(z_i>t_2)\]
FCD for \(\beta_1\)
Assuming an uninformative prior
\[\beta_1 \sim N(0,10)\]
the FCD for \(\beta_1\) is given by:
\[\beta_1 \sim N([\sum_i x_i^2 + \frac{1}{10}]^{-1} \sum_i x_i z_i,[\sum_i x_i^2 + \frac{1}{10}]^{-1})\]
Analysis steps
Simulate some data. These data will be critical to determine if our algorithm is working appropriately.
Create separate functions to sample \(\beta_1\), \(t_1\), \(t_2\), and \(z_i\).
Use the functions you created in (1) to develop your customized Gibbs sampler
When we apply our customized Gibbs sampler to the simulated data, are we able to successfully retrieve the true values for \(\beta_1\),\(t_1\) and \(t_2\)?
Using the actual data, re-format it so that it is suitable for analysis.
Analyze this reformatted dataset to determine what is the maximum dose level for which prob(malformation|dose) < 0.05 and prob(nonlive|dose) < 0.01.
References
Agresti, Alan. 2002. Categorical Data Analysis. New Jersey: Wiley.
Liddell, T. M., and J. K. Kruschke. 2018. “Analyzing Ordinal Data with Metric Models: What Could Possibly Go Wrong?” Journal of Experimental Social Psychology 79: 328–48.
Comments?
Send me an email at