7a_Likelihood_regression.utf8

Definition

Very similar ideas apply when we have covariates and we want to specify the likelihood function for a regression model. In this case, the likelihood function is a function of the vector of parameters \(\theta\) and is equal to the probability of the data \(y_1,...,y_n\) given the vector of parameters \(\theta\) and covariate vectors \(x_1,...,x_n\):

\[L(\theta) = p(y_1,...,y_n|\theta,x_1,...,x_n)\]

Again, as a simplifying assumption, often times people assume that observations are conditionally independent given model parameters and covariates, which implies that we can write the likelihood as:

\[L(\theta) = \prod_{i=1}^{n} p(y_i|\theta,x_i)\]

Choosing a probability distribution for the likelihood

The next step is choosing specific probability distributions to represent each term \(p(y_i|\theta,x_i)\) of the likelihood. Differently from our earlier examples, we will have to modify the probability distributions that we saw previously to accommodate our covariates. Nevertheless, the choice of which probability distribution to use still depends on the characteristics of the data.

Key aspects regarding all the examples below are that

we need to be very cognizant of the range of values that the individual parameters can take;
we need to know how these parameters are related to the moments of these distributions (e.g., mean).

1) Normal distribution

If \(y_i\) is a real number, then the Normal distribution is commonly used, yielding the following likelihood:

\[L(\beta_0,...,\beta_p,\sigma^2) \propto \prod_{i=1}^{n} N(y_i|\mu_i,\sigma^2)\] where

\[\mu_i=\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip}\]

Alternatively, instead of assuming a constant variance \(\sigma^2\), we could have assumed that this variance also changes with the covariates by defining our likelihood in the following way:

\[L(\beta_0,...,\beta_p,\gamma_0,...,\gamma_p,) \propto \prod_{i=1}^{n} N(y_i|\mu_i,\sigma_i^2)\]

where

\[\mu_i=\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip}\] \[\sigma^2_i=exp(\gamma_0+\gamma_1 x_{i1}+...+\gamma_p x_{ip})\]

Notice that I had to use the function \(exp()\) to ensure that \(\sigma^2_i\) is always positive.

Also notice that I use the proportionality sign (i.e., \(\propto\)) for PDF’s whereas I use the equal sign for PMF’s. The reason for this is that densities are proportional to probabilities but are not equal to them. For example:

\[p(y_i|x_i,\beta_0,\beta_1,\sigma^2)\propto N(y_i|\beta_0+\beta_1 x_i, \sigma^2)\]

2) Gamma distribution

If \(y_i\) is a positive real number, then the Gamma distribution can be used \[L(\beta_0,...,\beta_p,a_2) \propto \prod_{i=1}^{n} Gamma(y_i|a_{i1},a_{2})\] I model the mean \(E[y_i|x_i]\) as a function of the covariates and I use the \(exp()\) function because this is one way of ensuring that the mean is always positive:

\[E[y_i|x_i]=\mu_i=\frac{a_{i1}}{a_{2}}=exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})\]

Therefore

\[a_{i1}=exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})\times a_{2}\] In this model, I assume that only the first parameter \(a_{i1}\) varies for each observation but that the second parameter \(a_2\), which measures dispersion, does not.

3) Beta distribution

If \(y_i\) is a real number between 0 and 1, then the beta distribution is commonly used \[L(\beta_0,...,\beta_p,\tau) \propto \prod_{i=1}^{n} Beta(y_i|a_{i1},a_{2})\]

I model the mean \(E[y_i|x_i]\) as a function of the covariates and I use the \(\frac{exp()}{1+exp()}\) function because this is one way of ensuring that the mean is always between 0 and 1:

\[E[y_i|x_i]=\mu_i=\frac{a_{i1}}{a_{i1}+a_{2}}=\frac{exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}{1+exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}\]

Therefore

\[a_{i1}=exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})\times a_2\] Similar to the Gamma model, I assume that only the first parameter \(a_{i1}\) varies for each observation but that the second parameter \(a_2\), which measures dispersion, does not.

4) Binomial distribution

If \(y_i\) is a non-negative integer number with an upper bound \(n\), then the binomial distribution is commonly used \[L(\beta_0,...,\beta_p) = \prod_{i=1}^{n} Binomial(y_i|\pi_i,n)\] where

\[\pi_i=\frac{exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}{1+exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}\]

Again, I use the function \(\frac{exp()}{1+exp()}\) to make sure that \(\pi_i\) is always constrained to be between 0 and 1. Notice that I assume that the parameter \(n\) is given and therefore we are not estimating how \(n\) changes as a function of covariates.

5) Poisson distribution

If \(y_i\) is a non-negative integer number without an upper bound, then the Poisson distribution is commonly used \[L(\beta_0,...,\beta_p) = \prod_{i=1}^{n} Poisson(y_i|\lambda_i)\]

where

\[\lambda_i=exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})\]

Again, I use the function \(exp()\) to make sure that \(\lambda_i\) is always positive.

6) Bernoulli distribution

If \(y_i\) is a binary variable, then the Bernoulli distribution is commonly used \[L(\beta_0,...,\beta_p) = \prod_{i=1}^{n} Bernoulli(y_i|\pi_i)\] where

\[\pi_i=\frac{exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}{1+exp(\beta_0+\beta_1 x_{i1}+...+\beta_p x_{ip})}\]

I use the function \(\frac{exp()}{1+exp()}\) to make sure that \(\pi_i\) is always constrained to be between 0 and 1.

These are just some examples of parameterizations that we could adopt but many other approaches are possible. It all depends on your creativity and what you are trying to accomplish.

Back to main menu

Comments?

Send me an email at