General structure of MH algorithm

In this section, we are going to explore a specific type of MH algorithm, called the random walk MH. Say that we have the following model:

\[p(\beta_0,\beta_1|...)\propto [\prod_{i=1}^n Poisson(S_i|exp(\beta_0+\beta_1 \times log(A_i)))]\times N(\beta_0|0,100)N(\beta_1|0,100)\]

We are interested in sampling from the posterior distribution \(p(\beta_0,\beta_1|...)\) distribution but we only know it up to a proportionality constant. Let’s call this distribution the target distribution T().

The idea behind MH is that we propose a new value of the parameter we are interested in (\(\beta_0^{new}\)) and we accept or reject this new value based on a threshold probability (\(p_{\beta_0,thresh}\)). If we do not accept this new value, we retain the old value (\(\beta_0^{old}\)). We do the same thing for \(\beta_1\).

Here is the algorithm:

  1. Start with values for \(\beta_0\) and \(\beta_1\)

  2. Set \(\beta_0^{old}=\beta_0\) and propose a new value (\(\beta_0^{new}\))

  3. We evaluate if we are going to accept \(\beta_0^{new}\) based on \(p_{\beta_0,thresh}\). If we:

  • accept \(\beta_0^{new}\), then we set \(\beta_0=\beta_0^{new}\)
  • reject \(\beta_0^{new}\), then we set \(\beta_0=\beta_0^{old}\)
  1. Now do the same thing for \(\beta_1\). Set \(\beta_1^{old}=\beta_1\) and propose a new value (\(\beta_1^{new}\))

  2. We evaluate if we are going to accept \(\beta_1^{new}\) based on \(p_{\beta_1,thresh}\). If we:

  • accept \(\beta_1^{new}\), then we set \(\beta_1=\beta_1^{new}\)
  • reject \(\beta_1^{new}\), then we set \(\beta_1=\beta_1^{old}\)
  1. Store the current value for \(\beta_0\) and \(\beta_1\) and go back to step 2.

If we keep on doing this, we will end up with a sequence of values for \(\beta_0\) and \(\beta_1\). Notice that whenever we reject the proposed new values, we end up with repeated values for \(\beta_0\) or \(\beta_1\). This is ok. If we have crafted the algorithm correctly, then eventually the sequence of numbers will be correlated samples from our target distribution. The key pieces of this algorithm to ensure that it works correctly are related to how we:

  1. propose new values \(\beta_0^{new}\) and \(\beta_1^{new}\); and
  2. calculate the threshold probabilities \(p_{\beta_0,thresh}\) and \(p_{\beta_1,thresh}\).

Proposal distribution

We propose \(\beta_0^{new}\) and \(\beta_1^{new}\) using a proposal distribution Q(). For the random walk MH algorithm, the proposal distribution is the normal distribution with mean equal to \(\beta_0^{old}\) (if we are proposing a new value for \(\beta_0\)) or \(\beta_1^{old}\) (if we are proposing a new value for \(\beta_1\)):

\[\beta_0^{new}\sim N(\beta_0^{old},\sigma_{0,jump}^2)\] \[\beta_1^{new}\sim N(\beta_1^{old},\sigma_{1,jump}^2)\]

Notice that the parameters \(\sigma_{0,jump}^2\) and \(\sigma_{1,jump}^2\) are tuning parameters (i.e., parameters that we can tweak to improve the performance of our algorithm).

Probability threshold

The probability threshold for \(\beta_0\) is given by:

\[p_{\beta_0,thresh}=min(1,\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)} \frac{Q(\beta_0^{old}|\beta_0^{new})}{Q(\beta_0^{new}|\beta_0^{old})})\] The probability threshold for \(\beta_1\) is given by:

\[p_{\beta_1,thresh}=min(1,\frac{T(\beta_0,\beta_1^{new})}{T(\beta_0,\beta_1^{old})} \frac{Q(\beta_1^{old}|\beta_1^{new})}{Q(\beta_1^{new}|\beta_1^{old})})\]

Here are some important observations regarding this probability threshold:

  1. When use a symmetric proposal distribution (e.g., normal distribution), then \(p_{\beta_0,thresh}\) simplifies to:

\[p_{\beta_0,thresh}=min(1,\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)})\] because

\[\frac{Q(\beta_0^{old}|\beta_0^{new})}{Q(\beta_0^{new}|\beta_0^{old})}= \frac{\frac{1}{\sqrt{2\pi \sigma_{0,jump}^2}} exp(-\frac{1}{2\sigma_{0,jump}^2}(\beta_0^{old}-\beta_0^{new})^2)}{\frac{1}{\sqrt{2\pi \sigma_{0,jump}^2}} exp(-\frac{1}{2\sigma_{0,jump}^2}(\beta_0^{new}-\beta_0^{old})^2)})=1\]

Similarly, \(p_{\beta_1,thresh}\) simplifies to:

\[p_{\beta_1,thresh}=min(1,\frac{T(\beta_0,\beta_1^{new})}{T(\beta_0,\beta_1^{old})})\]

  1. Recall from Bayes theorem that:

\[p(\beta_0,\beta_1|D)=\frac{p(D|\beta_0,\beta_1)\times p(\beta_0,\beta_1)}{p(D)}\] where \(D\) stands for data. Why don’t we need to know the normalizing constant \(p(D)\) when using the MH algorithm? Notice that:

\[p_{\beta_0,thresh}=min(1,\frac{p(\beta_0^{new},\beta_1|D)}{p(\beta_0^{old},\beta_1|D)})\] \[p_{\beta_0,thresh}=min(1,\frac{\frac{p(D|\beta_0^{new},\beta_1)p(\beta_0^{new},\beta_1)}{p(D)}}{\frac{p(D|\beta_0^{old},\beta_1)p(\beta_0^{old},\beta_1)}{p(D)}})\]

Since \(p(D)\) is present both in the numerator and denominator and it does not depend on the specific value of \(\beta_0\), it cancels out. Therefore, this expression simplifies to:

\[p_{\beta_0,thresh}=min(1,\frac{p(D|\beta_0^{new},\beta_1)p(\beta_0^{new},\beta_1)}{p(D|\beta_0^{old},\beta_1)p(\beta_0^{old},\beta_1)})\] \[p_{\beta_0,thresh}=min(1,\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)})\] This is why we only have to know the target distribution up to a proportionality constant to be able to sample it using the MH algorithm. The same idea applies to \(p_{\beta_1,thresh}\).

  1. We operationalize the calculation of this ratio using logs to avoid rounding errors.



Back to main menu

Comments?

Send me an email at