Bayes theorem

In science, we are typically interested in \(p(H|D)\), where \(H\) is our hypothesis and \(D\) are our evidence/data. To get this quantity, we need Bayes theorem:

\[p(H|D)=\frac{p(D|H)p(H)}{p(D)}\]

Bayesians refer to \(p(H)\) as priors (i.e., our prior belief on each hypothesis). Then, we learn from the data through the likelihood \(p(D|H)\) and we update our beliefs about the different hypotheses given the data, summarized in the posterior \(p(H|D)\).

Notice that this updating of beliefs can be done sequentially as new evidence/data arises. For example, say that we originally just have one evidence/dataset \(D_1\). Based on this, we can calculate:

\[p(H|D_1)=\frac{p(D_1|H)p(H)}{p(D_1)}\]

Say that we then obtain another piece of evidence/dataset \(D_2\). We can then calculate:

\[p(H|D_1,D_2)=\frac{p(D_2,D_1,H)}{p(D_2,D_1)}=\frac{p(D_2|D_1,H)p(D_1|H)p(H)}{p(D_2|D_1)pD(_1)}=\frac{p(D_2|D_1,H)}{p(D_2|D_1)}\times p(H|D_1)\]

In this equation, we are updating our current belief regarding hypothesis \(H\), denoted by \(p(H|D_1)\). In other words, \(p(H|D_1)\) is our new “prior” because it consists of our belief prior to seeing \(D_2\). If you think about it, these equations are really nice because they reflect in a very principled way how we intuitively think science works (i.e., as evidence accumulates, we should be able to have increasingly greater (or less) confidence on a particular hypothesis).

How are these ideas regarding updating our beliefs related to standard statistical models (e.g., regression models)?

Each distinct value that a parameter (e.g., slope parameters, random effects, etc.) can take can be regarded as a distinct hypothesis. We want to make statements regarding these parameters after we see the data and have learned from it. For instance, we may want to say that there is a 95% probability that a particular slope parameter \(\beta_i\) is between A and B. Although this is a sensible statement to make, this is not the correct interpretation of a 95% confidence interval derived from a frequentist approach!

Notice that the likelihood \(p(D|H)\) plays a key role in this equation but it is not the only ingredient (i.e., there are also the priors). This is where Bayesian and frequentist statistics differ from each other.

A really nice description of the history associated with Bayes theorem was written by Bob Rosenfeld (Vermont Mathematics Initiative) and can be found here

Back to main menu

Comments?

Send me an email at