correlation_random

Model

To explain how the introduction of random effects induces correlation for measurements within the same county, I will change how I write our radon model. Recall that in this model, the log-radon concentration for the i-th house in the k-th county is denoted by \(y_{ik}\) and the covariates are \(x_{ik1}\) and \(x_{k2}\). I assume that:

\[y_{ik} = \beta_{0}+\beta_1 x_{ik1} + \beta_2 x_{k2} + u_k + e_{ik} \] where the county-level random effect \(u_k\) is given by:

\[u_k \sim N(0,\tau^2)\]

and the error terms \(e_{ik}\) are given by:

\[e_{ik} \sim N(0,\sigma^2)\]

Notice that this is specification is equivalent to our original model because:

\(\beta_{0k}=\beta_0 + u_k\). As a result, \(\beta_{0k} \sim N(\beta_0,\tau^2)\).
\(y_{ik} = \beta_{0k}+\beta_1 x_{ik1} + \beta_2 x_{k2} + e_{ik}\) is equivalent to \(y_{ik} \sim N(\beta_{0k}+\beta_1 x_{ik1} + \beta_2 x_{k2},\sigma^2)\).

What is the correlation between measurements in the same county?

We are interested in determining the correlation between measurements \(y_{1k}\) and \(y_{2k}\) in the same county \(k\). To calculate this correlation, we first need to determine the covariance between these measurements:

\[Cov(y_{1k},y_{2k})=Cov(\beta_{0}+\beta_1 x_{1k1} + \beta_2 x_{k2} + u_k + e_{1k},\beta_{0}+\beta_1 x_{2k1} + \beta_2 x_{k2} + u_k + e_{2k})\] \[=Cov(u_k + e_{1k},u_k + e_{2k})=Cov(u_k,u_k)+Cov(u_k,e_{2k})+Cov(e_{1k},u_k)+Cov(e_{1k},e_{2k})\] \[=Cov(u_k,u_k)=\tau^2\]

To calculate the implied correlation, we also need to determine the variance of each measurement:

\[Var(y_{ik})=Var(\beta_{0}+\beta_1 x_{ik1} + \beta_2 x_{k2} + u_k + e_{ik})=Var(u_k)+Var(e_{ik})=\tau^2+\sigma^2\] Therefore, the implied correlation is given by:

\[Corr(y_{1k},y_{2k})=\frac{Cov(y_{1k},y_{2k})}{\sqrt{Var(y_{1k})} \times \sqrt{Var(y_{2k})}}=\frac{\tau^2}{\sigma^2 + \tau^2}\]

What is the correlation between measurements in two distinct counties?

We are interested in determining the correlation between measurements \(y_{i1}\) and \(y_{i2}\). Notice that now the first measurement comes from county 1 while the second one comes from county 2. To calculate this correlation, we first need to determine the covariance between these measurements:

\[Cov(y_{i1},y_{i2})=Cov(\beta_{0}+\beta_1 x_{i11} + \beta_2 x_{12} + u_1 + e_{i1},\beta_{0}+\beta_1 x_{i21} + \beta_2 x_{22} + u_2 + e_{i2})\] \[=Cov(u_1 + e_{i1},u_2 + e_{i2})=Cov(u_1,u_2)+Cov(u_1,e_{i2})+Cov(e_{i1},u_2)+Cov(e_{i1},e_{i2})=0\] As a result, the correlation between measurements in distinct counties is zero.

Comments?

Send me an email at