Model notation

Sometimes it can be confusing to see the same model described in different ways. To illustrate what I mean, I will focus on a Gaussian regression example with a single covariate. This model can be described as:

\[y_i \sim N(\beta_0 +\beta_1 x_i, \sigma^2) \]

Sometime people even add “iid” (short hand for independent, identically distributed) over the tilde sign to be more explicit regarding what is being assumed.

However, another way of writing this same model consists of describing the likelihood as shown below:

\[L(\beta_0,\beta_1,\sigma^2) = \prod _{i=1}^N N(y_i |\beta_0 +\beta_1 x_i, \sigma^2) \]

Both approaches are equally correct. However, I often write my models as shown in the beginning because it resembles how we would simulate the data and is very similar to how we define models within Bayesian software (e.g., JAGS and Nimble).

Subscripts

People are also often confused with subscripts. For example, in the model above, we use \(0\) and \(1\) to distinguish between the intercept \(\beta_0\) and slope \(\beta_1\) parameters. So that is easy.

However, why do the variables \(y_i\) and \(x_i\) have subscripts \(i\)? The reason I use this subscript is to indicate that the values for these variables potentially change for each observation \(i\) in our dataset.

Things can get more complicated if we have multiple subscripts. For example, say that I have growth data for each tree \(i\) of species \(s\) in year \(t\). In this case, my response variable would be \(y_{ist}\).

Some variables (e.g., drought) might only change from year to year, having the same value for all trees within a given year. As a result, this variable in my model would only have the year subscript \(t\) (I would not include the tree and species subscripts). Similarly, if I would like to allow each species to have its own intercept parameter, I can describe this in the model by changing \(\beta_0\) to \(\beta_{0s}\).

This type of thing is relatively subtle and it is easy to overlook this type of detail. However, I think that this type of detail is critical for you to be able to more readily understand models that other people have created and for you to communicate your model as clearly as possible to other people.



Back to main menu

Comments?

Send me an email at