Kidney cancer example

This is based on an example on kidney cancer death rates, presented in (Gelman 1998). The figure below shows the counties with the highest kidney cancer death rates.

How were these cancer death rates calculated?

Let \(y_i\) be the number of deaths and \(N_i\) be the population size for the i-th county. Because the probability of dying of kidney cancer \(\theta_i\) is typically very small while \(N_i\) is relatively large, a common model for these data is:

\[y_i\sim Poisson(\theta_i N_i)\] Because \(E[y_i ]=\theta_i N_i\), then an estimate of kidney cancer death risk is \(\hat{\theta}_i = y_i/N_i\). It turns out that this is also the MLE for \(\theta\) and this is the estimate that is shown above in the maps.

Guiding questions

  1. Is there any noticeable spatial pattern in these maps? If there is, why do we see these patterns?

  2. Can the observed spatial pattern be an artifact of how we estimated kidney cancer death risk? We will explore this by simulating data using the population sizes of FL counties (these data are available here “FL county pop.csv”).

  3. If there is a problem with how we estimate kidney cancer death risk, can we come up with a better way of estimating this risk?

For this assignment, use 1,000 simulations to compare which estimate of cancer death risk is best. Your report for this assignment should contain information regarding how you came up with this alternative estimate and, based on the simulations, how well it compares to the standard estimate. Remember to include your code for these 1,000 simulations.



Back to main menu

Comments?

Send me an email at

References

Gelman, A. 1998. “Some Class-Participation Demonstrations for Decision Theory and Bayesian Statistics.” American Statistician 52: 167–74.