Intuition
Some intuition of why this algorithm works comes from the equation:
\[p_{\beta_0,thresh}=min(1,\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)})\]
If the proposed parameter value \(\beta_0^{new}\) is in an area of higher probability relative to where \(\beta_0^{old}\) is, then:
\[\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)}>1\]
and thus \(p_{\beta_0,thresh}=1\). This implies that we always accept \(\beta_0^{new}\) if this is coming from a region with higher probability (i.e., we always accept these up hill moves).
When the proposed parameter value \(\beta_0^{new}\) is in an area of lower probability relative to where \(\beta_0^{old}\) is, then:
\[\frac{T(\beta_0^{new},\beta_1)}{T(\beta_0^{old},\beta_1)}<1\]
and thus \(p_{\beta_0,thresh}<1\). This implies that we may or may not accept \(\beta_0^{new}\) even if the new region has a lower probability (i.e., we sometimes accept down hill moves). This feature ensures that we have some (albeit few) samples in areas of low probability.
Demonstration
Somethings are easier to understand when we see them rather than when we hear an explanation about it. So here is a GIF that might be useful for you:
Finally, you can read more about different MCMC algorithms, including the MH algorithm discussed here, in (Andrieu et al. 2003).
References
Andrieu, C., N. Freitas, A. Doucet, and M. I. Jordan. 2003. βAn Introduction to Mcmc for Machine Learning.β Machine Learning 50: 5β43.
Comments?
Send me an email at