Mark-recapture

In this example, we are interested in estimating population size of a wildlife species using mark-recapture data. We collect these data by going into the field multiple times, during which we capture individuals and put tags with unique identifiers in them. As a result, we might retrieve individuals that had already been tagged before and/or get completely new individuals in these surveys. The key assumptions here are that:

  1. We need to be able to individually recognize each animal and thus be able to have repeated observations on each individual. These data historically arouse from capturing animals and uniquely marking them in the first encounter and then trying to recapture them in subsequent surveys. This explains why we use the term “capture-recapture” but it is important to note that uniquely identifying each individual might not require physical capture (Kery and Schaub 2012). For example, we might be able to identify individuals from photographs, based on their unique scars or fur patterns, from camera traps.

  2. These different surveys need to be close enough in time so that we can assume that the same set of individuals is available throughout these surveys (i.e., individuals have not moved in or out of the study region, individuals have not died, and new individuals have not been born).

It is important to note that mark-recapture methods are also used in other contexts as well. For example, in epidemiology, this methodology is used to estimate the number of people with particular conditions (e.g., people infected with HIV or addicted to illegal drugs).

How does it work?

Our goal is to determine the population size \(N\). We go into the field and get to observe only \(C\) individuals (“C” stands for captured), where \(C \leq N\). In principle, we could assume the following binomial distribution to model these data:

\[C \sim Binomial(N,\delta)\]

where \(\delta\) is the capture probability. The problem is that we cannot simultaneously estimate \(N\) and \(\delta\). To illustrate this, notice that \(E[C]=N\times \delta\) and, as a result, there are different combinations of \(N\) and \(\delta\) that can yield the same outcome \(C\). If we knew what \(\delta\) is, then we could potentially estimate population size as \(\hat N=\frac{C}{\delta}\).

Here is where the re-captured individuals (RC) come in. Say that we have been able to re-capture \(RC\) individuals and that we model these data as:

\[RC \sim Binomial(C,\delta)\] This is great because now we can use the relationship \(E[RC]=C\times \delta\) to estimate the capture probability as \(\hat \delta=\frac{RC}{C}\). Once we have \(\delta\), we can finally estimate \(N\).

This example illustrates how to get point estimates \(\hat \delta\) and \(\hat N\). However, we typically also want to estimate how much uncertainty there is in our population size estimate \(\hat N\). Uncertainty in \(\hat N\) arises from sampling uncertainty associated with the binomial distribution \(C\sim Binom(N,\delta)\) and uncertainty when estimating \(\delta\).

Fitting this model within a Bayesian framework will enable us to correctly estimate and propagate the uncertainty in these parameters.

Generative model

We will make the model slightly more complicated by assuming that we have multiple surveys, not only two as in the example above. Let \(C_{it}\) be a binary variable indicating if animal i (i=1,…,N) was observed at survey t (t=1,…,T). Assuming capture probability of \(\delta\), our model is given by:

\[C_{it} \sim Bernoulli(\delta)\]

Obviously this is a very simple model in the sense that we assume that capture probability \(\delta\) does not depend on the individual animal characteristics (e.g., perhaps capture probability depends on animal size or color) and does not change with time or with capture history (e.g., individuals do not become trap happy or trap shy).

Simulating data

Here is how we generate some simulated data:

#generate some fake data
rm(list=ls(all=T))
set.seed(1)

#true parameter values
delta.true=delta=0.3
N=200 #true population size

#number of surveys 
T1=4
#observation status of each individual
C1=matrix(0,N,T1)
for (i in 1:T1){
  C1[,i]=rbinom(N,size=1,p=delta)
}

#each colum corresponds to a particular sampling occasion
colnames(C1)=paste0('so',1:T1)

#here is what the data look like
head(C1)
##      so1 so2 so3 so4
## [1,]   0   0   0   1
## [2,]   0   0   0   1
## [3,]   0   0   1   0
## [4,]   1   0   1   1
## [5,]   0   0   1   1
## [6,]   1   0   1   1
#In the real dataset, we never observe individuals with 0 captures
#Therefore, we need to remove these individuals from our dataset
cond=apply(C1,1,sum)!=0
data1=C1[cond,]
nrow(data1) #number of individuals we get to observe at least once. Notice how this is less than N.
## [1] 153
setwd('U:\\uf\\courses\\bayesian course\\group activities\\5 example pop size')
write.csv(data1,'simulated data popsize.csv',row.names=F)



Back to main menu

Comments?

Send me an email at

References

Kery, M., and M. Schaub. 2012. Bayesian Population Analysis Using WinBUGS: A Hierarchical Perspective. Elsevier.