Why this basketball example?

It seems like we have spent a lot of time going into excruciating detail on an example that ultimately seems to have little relevance. However, I find this example to be useful for several reasons:

  1. there are definitively many more examples in which we are trying to estimate proportions. Can we think of other examples in which we are trying to estimate proportions as well?

  2. it provides some intuition of how to come up with uninformative priors and how the posterior distribution is a compromise between the prior and the information in the data

  3. this conjugate pair might arise in more complicated models as one of the full conditional distributions that are required for the Gibbs sampler

Here is an example I have worked on where I used this conjugate pair (you can find more details in my article (Valle et al. 2015)). In multiple disciplines, it is very common to have a method that is very accurate and precise but is too costly to use widely (often called the gold standard method) and a method that is cheaper and simpler but that has worse performance. As a result, we might end up with lots of data coming from the cheaper method and, if we are lucky, some data coming from the gold standard method. In malaria epidemiology, the gold standard diagnostic method is microscopy (or often times Polimerase Chain Reaction - PCR) and the diagnostic method actually used in rural health facilities is the Rapid Diagnostic Test (RDT).

We have individual level data on covariates (e.g., potential risk factors such as rainfall, wealth, etc.) and on RDT binary results (1=detected to be infected, 0=not detected to be infected). We are interested in how these different risk factors influence the probability that the person is infected but unfortunately we can’t observe infection status directly. As a result, we can’t just run a traditional logistic regression here.

Let \(I_i\) be the latent infection status of individual i. We are interested in the regression parameters \(\beta_0,\beta_1,...\) in the model below:

\[I_i \sim Bernoulli(\frac{exp(\beta_0+\beta_1 x_{i1}+...)}{1+exp(\beta_0+\beta_1 x_{i1}+...)})\] We then assume that the RDT results \(D_i\) arise in the following way: \[D_i|I_i=1 \sim Bernoulli(S_e)\] \[D_i|I_i=0 \sim Bernoulli(1-S_p)\] where \(S_e\) is the sensitivity \(p(D_i=1|I_i=1)\) and \(S_p\) is the specificity \(p(D_i=0|I_i=0)\) of the imperfect diagnostic test, in this case RDT.

Unfortunately, the way this model is specified does not allow us to estimate all these parameters \(S_p,S_e,\beta_0,\beta_1,...\). One option is to use the gold standard method on a subsample of individuals. Another option is to use the literature to inform what \(S_p\) and \(S_e\) should be. This is the option that we are going to pursue.

We often find articles in the literature that report on the performance of different diagnostic methods, typically using the following numbers:

  • \(N_+\): number of individuals detected to be infected by the gold standard method
  • \(D_+\): number of individuals detected to be infected by the regular method out of all \(N_+\) individuals
  • \(N_-\): number of individuals not detected to be infected by the gold standard method
  • \(D_-\): number of individuals not detected to be infected by the regular method out of all \(N_-\) individuals

Say we had \(N_+=10\),\(D_+=8\),\(N_-=500\),\(D_-=300\). One option here is to simply use a plug-in estimate for \(S_e=8/10\) and \(S_p=300/500\). The problem with this approach is that we have a lot more information on \(S_p\) than on \(S_e\). Yet, if we just use these estimates, we are implicitly assuming that we know these parameters exactly and that there is no uncertainty around them.

Another option is to use a “mini” Bayesian analysis on these data from the literature to come up with informative priors:

\[p(S_e|N_+,D_+)\propto Binom(D_+|N_+,S_e)Beta(S_e|1,1)\] \[=Beta(D_+ +1,N_+ - D_+ +1)=Beta(8+1,10-8+1)\]

Similarly, \[p(S_p|N_-,D_-)\propto Binom(D_-|N_-,S_p)Beta(S_p|1,1)\] \[= Beta(D_- +1, N_- - D_- +1)=Beta(300+1,500-300+1)\] Notice that the prior for \(S_p\) is a lot more influential than the prior for \(S_e\), which should be evident given the magnitude of the parameters in the corresponding beta distributions. This is exactly what we want given that we have a lot more information on \(S_p\) than on \(S_e\). Furthermore, we did not have to arbitrarily specify how much more influential one prior has to be in relation to the other. This arouse naturally from this “mini” Bayesian analysis. This example also highlights one of the major strengths in Bayesian statistics. We can use informative priors on the subset of the parameters that are not the main focus of our analysis (\(S_p\) and \(S_e\) in this case) while we can use uninformative priors for the rest of the parameters that we are really interested in.



Back to main menu

Comments?

Send me an email at

References

Valle, Denis, Joanna M. Tucker Lima, Justin Millar, Punam Amratia, and Ubydul Haque. 2015. “Bias in Logistic Regression Due to Imperfect Diagnostic Test Results and Practical Correction Approaches.” Malaria Journal 14. https://doi.org/10.1186/s12936-015-0966-y.