Intuition

Some intuition of why this algorithm works comes from the equation:

\[p_{\beta_0,thresh}=min(1,\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)})\]

If the proposed parameter value \(\beta_0^{new}\) is in an area of higher probability relative to where \(\beta_0^{old}\) is, then:

\[\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)}>1\]

and thus \(p_{\beta_0,thresh}=1\). This implies that we always accept \(\beta_0^{new}\) if this is coming from a region with higher probability (i.e., we always accept these up hill moves).

When the proposed parameter value \(\beta_0^{new}\) is in an area of lower probability relative to where \(\beta_0^{old}\) is, then:

\[\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)}<1\]

and thus \(p_{\beta_0,thresh}<1\). This implies that we may or may not accept \(\beta_0^{new}\) even if the new region has a lower probability (i.e., we sometimes accept down hill moves). This feature ensures that we have some (albeit few) samples in areas of low probability.

Demonstration

Somethings are easier to understand when we see them rather than when we hear an explanation about it. So here is a GIF that might be useful for you:

Finally, you can read more about different MCMC algorithms, including the MH algorithm discussed here, in (Andrieu et al. 2003).



Back to main menu

Comments?

Send me an email at

References

Andrieu, C., N. Freitas, A. Doucet, and M. I. Jordan. 2003. β€œAn Introduction to Mcmc for Machine Learning.” Machine Learning 50: 5–43.