Learning objectives

The goal of this page is to illustrate, through an example on the relationship between race and the death penalty, the concepts of

  • joint probabilities,
  • conditional probabilities, and
  • marginal probabilities

Data

Consider the following tabulation of 674 subjects trialed for multiple murders in Florida between 1976 and 1987. This example and data comes from (Agresti 2002). The table below show the outcome of the death penalty (“Yes” and “No” if received or did not receive the death penalty, respectively) according to the defendant’s race (black or white):

Defendant’s race Yes No Total
White 53 430 483
Black 15 176 191
Total 68 606 674

Joint probabilities

We can convert these into probabilities by dividing all the cells by the total number of individuals, yielding:

Defendant’s race Yes No Total
White 0.08 0.64 0.72
Black 0.02 0.26 0.28
Total 0.1 0.9 1

Let’s denote the death penalty result as D and the defendant’s race as R. We can think about the joint probability (i.e., the probability of D and R) as putting the 674 individuals for our dataset into a hat and asking what is the probability that we pick a person with a given D and R. For example:

  • \(p(D=yes,R=white)\): the probability of picking a person from the hat that was sentenced to death and is white is equal to \(\frac{53}{674}=0.08\).
  • \(p(D=yes,R=black)\): the probability of being black and receiving the death penalty \(p(D=yes,R=black)\) is \(\frac{15}{674}=0.02\).

By comparing these two probabilities, one might be tempted to say that, if anything, the judicial system penalizes more white individuals than black ones.

Marginal probabilities

The problem with the conclusion above is that the accused individuals are predominantly white (483 out of 674). In other words, the probability that we pick a black person from the hat (regardless of death sentence outcome) is very small and therefore it is inevitable that \(p(D=yes,R=black)\) is going to be even smaller.

The probability of picking a black person (regardless of death sentence) is called the marginal probability. This probability is called “marginal” because it is based on numbers that lie on the margins of our table. Here are examples of marginal probabilities:

  • \(p(R=black)\): the probability of picking a person from the hat that is black (regardless of death sentence) is \(\frac{191}{674}=0.28\).
  • \(p(D=yes)\): the probability of picking a person from the hat that received the death penalty (regardless of race) is \(\frac{68}{674}=0.1\).

Notice that the marginal and joint probabilities are related. For example:

\[p(R=black)=p(D=yes,R=black)+p(D=no,R=black)=0.02+0.26=0.28\] \[p(D=yes)=p(D=yes,R=black)+p(D=yes,R=white)=0.02+0.08=0.10\]

Conditional probabilities

Perhaps we can look at the proportion of accused black individuals that received the death penalty (i.e., 15/191). This is equivalent to asking: given that the defendant is black, what is the probability that he will receive a death penalty? In other words, let’s first remove all white individuals from the hat, leaving only black individuals on the hat, to then ask what is the probability of picking a person from the hat that received the death penalty.

This is an example of a conditional probability. Here are some conditional probabilities:

  • \(p(D=Yes|R=Black)\): the probability of receiving a death penalty given that the defendant is black is equal to \(\frac{15}{191}=0.08\).
  • \(p(D=Yes|R=White)\): the probability of receiving a death penalty given that the defendant is white is equal to \(\frac{53}{483}=0.11\).
  • \(p(R=Black|D=Yes)\): the probability of the defendant being black given that we know he has received the death penalty is equal to \(\frac{15}{68}=0.22\).

Notice that the conditional probability is related to the marginal and joint probabilities. More specifically, the conditional probability is equal to the joint probability divided by the marginal probability. For example:

\[p(D=yes|R=black)=\frac{15}{191}=\frac{15/674}{191/674}=\frac{p(D=yes,R=black)}{p(R=black)}\]

People often find confusing the concept of conditional probabilities. For instance, in our original dataset, what is the difference between \(p(D=yes|R=black)\) and \(p(R=black|D=yes)\)? \(p(D=yes|R=black)\) is calculated by putting all black defendants into a hat and determining the chance that we draw somebody that has received the death penalty. \(p(R=black|D=yes)\) is calculated by putting all defendants that received the death penalty in a hat and determining the chance that we draw a black person from a hat. These probabilities are very different because the denominators are different:

\[p(D=yes|R=black)=\frac{p(D=yes,R=black)}{p(R=black)}=\frac{15/674}{191/674}=\frac{15}{191}=0.08\] \[p(R=black|D=yes)=\frac{p(D=yes,R=black)}{p(D=yes)}=\frac{15/674}{68/674}=\frac{15}{68}=0.22\]

To understand this concept, it can be helpful to think about subsets. A|B is a subset of B. B|A is a (usually different) subset of A. To highlight these differences, take a minute to think about the following conditional probabilities and statements:

  • p(you are a US citizen | you are the US president) vs. p(you are the US president | you are a US citizen)
  • p(cremated|dead) vs p(dead|cremated)
  • p(my car is on fire | something is wrong with my car) vs. p(something is wrong with my car | my car is on fire)
  • Most polar bears are twins. Therefore, if you’re a twin, you’re probably a polar bear.

Going back to our original question, a comparison of the conditional probabilities \(p(D=yes|R=black)=0.08\) and \(p(D=yes|R=white)=0.11\) for black and white people, respectively, suggests that, although the gap is not as large as before, the justice system is still somewhat more severe to white individuals.

Importance of what we condition on

Conditional probabilities are a critical concept here. Are we missing part of the story here? We can disaggregate our table a little further, now adding the information regarding the race of the victim. In this table, “W.v” and “W.d” denote white victims and defendants while “B.v” and “B.d” denote black victims and defendants. Notice that this is the same dataset as before, just a little more disaggregated.

Victim/defendant race Yes No Total
W.v-W.d 53 414 467
B.v-W.d 0 16 16
W.v-B.d 11 37 48
B.v-B.d 4 139 143
Total 68 606 674

Let the victim’s and defendant’s race be denoted by \(R_v\) and \(R_d\), respectively. Now a completely different story emerges. We find that, if the victim is white, black defendants have a higher probability of receiving the death penalty than white defendants:

\[p(D=yes|R_d=white,R_v=white)=\frac{p(D=yes,R_d=white,R_v=white)}{p(R_d=white,R_v=white)}=\frac{53/674}{467/674}=0.11\] \[p(D=yes|R_d=black,R_v=white)=\frac{p(D=yes,R_d=black,R_v=white)}{p(R_d=black,R_v=white)}=\frac{11/674}{48/674}=0.23\]

Similarly, if the victim is black, black defendants have a higher probability of receiving the death penalty than white defendants:

\[p(D=yes|R_d=white,R_v=black)=\frac{p(D=yes,R_d=white,R_v=black)}{p(R_d=white,R_v=black)}=\frac{0/674}{16/674}=0\] \[p(D=yes|R_d=black,R_v=black)=\frac{p(D=yes,R_d=black,R_v=black)}{p(R_d=black,R_v=black)}=\frac{4/674}{143/674}=0.03\]

These results illustrate that depending on what we condition on, we can get completely different results. For instance, in typical regression problems, the inclusion of a particular covariate can completely change the relationship between the response and the other covariates.



Back to main menu

Comments?

Send me an email at

References

Agresti, Alan. 2002. Categorical Data Analysis. New Jersey: Wiley.