Statistical model

Our goal is to estimate the population size \(N\). Unfortunately our dataset only contains a fraction of the individuals that truly exist given that we were not able to observe all these individuals. For this reason, our model is going to differ slightly from the generative model in that we are going to use a data augmentation approach. More specifically, we will assume that we have \(\tilde{N}\) potential individuals, of which only a fraction \(\pi\) of them actually exist. Thus, the status of each individual is given by a binary variable \(z_i\) (\(i=1,...,\tilde{N}\)), which is equal to 1 if the animal exists and 0 otherwise. We assume that:

\[z_i \sim Bernoulli (\pi)\]

Notice that in this model, \(\pi\) is the proportion of potential individuals that truly exist. Therefore, our estimate of the true population size \(\hat{N}\) is given by the number of potential individuals \(\tilde{N}\) times \(\pi\):

\[\hat{N} = \tilde{N} \times \pi\] Let \(C_{it}\) be a binary variable indicating if animal i (i=1,…,N) was observed at survey t (t=1,…,T). Assuming capture probability of \(\delta\) and that that particular animal actually exists (i.e., \(z_i=1\)), our model is given by:

\[C_{it}|z_i=1 \sim Bernoulli(\delta)\]

If the animal does not exist, then we assume that it is impossible to observe that animal (i.e., \(p(C_{it}=1|z_i=0)=0\)). In other words, there are no false positives.

One way to succintly write these two assumptions is:

\[C_{it}|z_i \sim Bernoulli(\delta \times z_i)\] The full model can be written as:

\[p(\delta,\{z_i\},\pi|...)\propto [\prod_i \prod_t Bern(C_{it}|\delta \times z_i)][\prod_i Bern(z_i|\pi)]\times Beta(\pi|a,b) Beta(\delta|c,d)\]

Assignment

  1. Develop code in JAGS to analyze the data that we simulated. Are we able to estimate well the true parameters \(\delta\) and \(\pi\)? Are we able to estimate the true population size \(\hat{N}\)?
  1. Capture probability can vary from survey to survey (e.g., due to weather or noise conditions). How would you change this model to allow this probability to vary from survey to survey? For this task, I would like you to:

a - Describe your full model with appropriate notation

b - Modify your JAGS code to reflect this change

c - Simulate data that follows your generative model

d - Estimate model parameters using JAGS based on the simulated data. Does your Gibbs sampler work? Is it able to estimate the true parameter values?



Back to main menu

Comments?

Send me an email at