Abundance

Abundance Estimation with Index Counts and Mark-Recapture

Problem Definition

We have mark-recpature estimates of abundance, \(\hat{N}\), and the associated standard errors \(se_{\hat{N}}\). We also have a count, \(C\), of animals in an index location that are an unknown proportion, \(p\), of the latent abudnance, \(N\). Such data exist for many independent years. We wish to be able to infer \(N\) on a year when we only have \(C\).

The Model

\(\hat{N}\)~Normal(\(N\),\(se_{\hat{N}}\))

This means that we regard our mark-recapture estimates as unbiased becuase they are “centered” on the latent abudndance. We know that the mark-recapture estimate is imperfect, and so we simply use the standard error of the mark-recpature estimate to model the normally-distributed error around the mark-recpature point estimate.

\(C\)~Binomial(\(N\),\(p\))

The counts are like the result of a coin flipping experiment. The total number of flips is the latent abudnance, and the number of heads is the number of fish counted in the index location. We want to know the proportion \(p=C/N\). This proportion must satisfy \(0<=p<=1\). The binomial theory gives us a way of confining uncertainty in \(p\) to this range.

Notice that \(N\) is in both of the distributions given above. This is important. Our theory is that \(C\) contains information about \(N\). So we’re actually goint to estimate \(N\) even though we already have the mark-recatpure estimate, \(\hat{N}\). If \(C\) is totally unrelated to \(N\), then our new estimate of \(N\) will match \(\hat{N}\). But if there is a relationship, then the new estimate of \(N\) will leverage information contained in \(C\). Recall that we know the mark-recapture estimate is imperfect (\(se_{\hat{N}}\)).

\(p\)~Beta(\(a\),\(b\))

We shouldn’t assume that the proportion of the total abundance counted in the index site is perfectly contsant across years. Let’s give it some ability to move around. The paramters of the Beta distribution will describe how much \(p\) varies. For example, in the plot below \(p\) is constrained to a probability distirubiton defined by the Beta distribution where \(a\)=2 and \(b\)=40. We really want to know the values of \(a\) and \(b\) because they determine how well we can predict \(N\) when we only have \(C\).

p<-seq(0,1,0.01)
dp<-dbeta(p,2,40)
plot(p,dp,type='l',ylab='Probability Density',xlab='p')

Simulation test

First, let’s prove-up the theory I’ve outlined above with a simulation. Let’s choose values of \(N\), \(se_{\hat{N}}\), \(a\), and \(b\) and then use these to create data, \(\hat{N}\), and \(C\).

N<-floor(rlnorm(10,8.5,0.5)) #params here aren't of interest
sdNhat<-rnorm(10,100,20) #params here aren't of interest
Nhat<-floor(rnorm(10,N,sdNhat)) #This matches the first line of the model.
p<-rbeta(10,3,15) #This matches the third line of the model.  a=3, and b=15.
C<-rbinom(10,N,p) #This matches the second line of the model.

Now we need to write a statistical model that will just receive data \(\hat{N}\), \(se_{\hat{N}}\), \(C\) and then estimate the paramters \(N\), \(a\), \(b\), which we already know. We’ll do this below by maximizing the likelihood of the joint data, \(\hat{N}\), \(se_{\hat{N}}\), \(C\). For computational reasons, we’ll actually minimize the negative log likelihood, which is the same thing.

library(rmutil)

## 
## Attaching package: 'rmutil'

## The following object is masked from 'package:stats':
## 
##     nobs

LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
  for (i in 1:10){
    LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
    LL2[i]<-dbetabinom(C[i],floor(par[i]),par[11],par[12],log=TRUE)
  }
  -1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(rep(10000,10),0.2,2),fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),method='L-BFGS-B',lower=c(rep(min(N),10),0.0001,1),upper=c(rep(max(N),10),0.9999,100),control=list(factr=1e3))

A couple of things are worth noting in the code above. First, since the Biniomial and Beta distributions are both in the exponential family, then can be combined into the “betabinomial” distribution used here. The paramterizaiton is different, but it is mathematically equivalent. Second, the “floor”" function is needed because \(N\) must be integer-valued, but the optimizer uses floating point. Third, constrained optimization in needed because \(0<=p<=1\), and this is carried out with the “L-BFGS-B” method. Constraints on the other paramters are set very liberally.

Now let’s inspect the output of the optimization.

out

## $par
##  [1] 8246.3208831 6047.6470343 4219.0364708 6711.1387517 8497.0000000
##  [6] 7390.4702323 6364.7556926 5012.9037908 5849.0697996 6156.8711109
## [11]    0.1727046   10.5319054
## 
## $value
## [1] 252.7052
## 
## $counts
## function gradient 
##      140      140 
## 
## $convergence
## [1] 1
## 
## $message
## [1] "NEW_X"

mean(p)

## [1] 0.1637982

The 11th entry of “par” is the estimate of \(p\). Compare it to mean(p). Looks good. Plot the latent \(N\) against our new synthetic estimate of \(N\). This also looks good.

plot(N,out$par[1:10])

The Empirical Data and Maximum Likelihood Analysis

Now we’re going to do the same thing with real-world data.

Nhat<-c(14335,15891,2700,1218,2213,10985,4985,8738,13878)
seNhat<-sqrt(c(26915344,6574096,175561,51529,71824,12166144,145924,2808976,3319684))
C<-c(313,410,38,30,69,175,132,193,240)

LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
  for (i in 1:length(Nhat)){
    LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
    LL2[i]<-dbetabinom(C[i],floor(par[i]),par[length(Nhat)+1],par[length(Nhat)+2],log=TRUE)
  }
  -1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
           fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
           method='L-BFGS-B',
           lower=c(rep(100,length(Nhat)),0.0001,1),
           upper=c(rep(100000,length(Nhat)),0.9999,50000),
           hessian=TRUE)
out

## $par
##  [1] 1.475387e+04 1.488908e+04 2.703184e+03 1.025222e+03 2.297120e+03
##  [6] 1.065165e+04 4.971002e+03 8.939399e+03 1.445472e+04 2.144821e-02
## [11] 8.814388e+03
## 
## $value
## [1] 170.8571
## 
## $counts
## function gradient 
##      113      113 
## 
## $convergence
## [1] 52
## 
## $message
## [1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
## 
## $hessian
##                [,1]         [,2]          [,3]          [,4]          [,5]
##  [1,]  6.495782e-05 0.000000e+00  0.000000e+00 -7.105427e-09 -7.105427e-09
##  [2,]  0.000000e+00 6.937739e-05  0.000000e+00  0.000000e+00  0.000000e+00
##  [3,]  0.000000e+00 0.000000e+00  1.249134e-04 -7.105427e-09  0.000000e+00
##  [4,] -7.105427e-09 0.000000e+00 -7.105427e-09  8.563461e-05 -7.105427e-09
##  [5,] -7.105427e-09 0.000000e+00  0.000000e+00 -7.105427e-09  8.745360e-05
##  [6,] -7.105427e-09 0.000000e+00  3.552714e-09 -7.105427e-09  0.000000e+00
##  [7,]  0.000000e+00 0.000000e+00  7.105427e-09  0.000000e+00  7.105427e-09
##  [8,]  0.000000e+00 0.000000e+00  3.552714e-09  0.000000e+00  0.000000e+00
##  [9,] -3.552714e-09 0.000000e+00  0.000000e+00  0.000000e+00 -3.552714e-09
## [10,]  0.000000e+00 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
## [11,]  0.000000e+00 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
##                [,6]         [,7]         [,8]          [,9]        [,10]
##  [1,] -7.105427e-09 0.000000e+00 0.000000e+00 -3.552714e-09 0.000000e+00
##  [2,]  0.000000e+00 0.000000e+00 0.000000e+00  0.000000e+00 0.000000e+00
##  [3,]  3.552714e-09 7.105427e-09 3.552714e-09  0.000000e+00 0.000000e+00
##  [4,] -7.105427e-09 0.000000e+00 0.000000e+00  0.000000e+00 0.000000e+00
##  [5,]  0.000000e+00 7.105427e-09 0.000000e+00 -3.552714e-09 0.000000e+00
##  [6,]  9.057999e-05 7.105427e-09 0.000000e+00  0.000000e+00 0.000000e+00
##  [7,]  7.105427e-09 2.285248e-04 0.000000e+00  7.105427e-09 0.000000e+00
##  [8,]  0.000000e+00 0.000000e+00 3.303242e-04  0.000000e+00 0.000000e+00
##  [9,]  0.000000e+00 7.105427e-09 0.000000e+00  8.800782e-05 0.000000e+00
## [10,]  0.000000e+00 0.000000e+00 0.000000e+00  0.000000e+00 1.621247e+06
## [11,]  0.000000e+00 0.000000e+00 0.000000e+00  0.000000e+00 2.029122e-02
##               [,11]
##  [1,]  0.000000e+00
##  [2,]  0.000000e+00
##  [3,]  0.000000e+00
##  [4,]  0.000000e+00
##  [5,]  0.000000e+00
##  [6,]  0.000000e+00
##  [7,]  0.000000e+00
##  [8,]  0.000000e+00
##  [9,]  0.000000e+00
## [10,]  2.029122e-02
## [11,] -1.250555e-06

The 10th value of “par” is \(p\). The 11th value of “par” is the overdispersion parameter that describes variation in \(p\) that would be used when predicting \(N\) with just \(C\).

Look at the relationship between the new estimates of \(N\) and \(\hat{N}\)

plot(Nhat,out$par[1:9],xlab='Mark-recapture estimate',ylab='New estimate of abundance')
abline(0,1)

So it looks like information in \(C\) is contiributing very little information to a new estimate of \(N\).

The numbers in “par” above are just point estimates. What about standard errors of the paramters? In the chunk of code above, I asked the optimizer to return the Hessian, which is a square matrix of second-order partial derivates evaluted at the maximum likelihood estimates of each paramter.

\[H = \begin{pmatrix} \frac{\partial^2f}{\partial x_1^2} & \frac{\partial^2f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2f}{\partial x_1 \partial x_n} \\ \frac{\partial^2f}{\partial x_2^2} & \frac{\partial^2f}{\partial x_2^2} & \cdots & \frac{\partial^2f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2f}{\partial x_n \partial x_1} & \frac{\partial^2f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2f}{\partial x_n^2} \end{pmatrix}\]

This matrix gives the magnitue of the curvature in all directions of the multidimenesional likelihood surface. A more “spiked” surface means that a parameter can only take on a narrow range of values, whereas a flatter surface means that the parameter estimates are more uncertain.

We can get standard errors by calculating the inverse of the square root of the diagonal elements of the observed Fisher Information matrix \(\mathbf I(\mathbf \theta)\), where \(\mathbf \theta\) is a vector of all the paramters. The Fisher Information matrix is found by inverting the Hessian matrix.

\(\mathbf I(\mathbf \theta) = \mathbf H^{-1}\)

It is important to note that the Hessian pertains to a negative log likelihood, becuase that’s what we minimized in with the optimizer. If we had maximized the log likelihood without minimizing the negative log likelihood, then we’d have to take the negtaive of the Hessian before inverting it.

And now

\(s_\mathbf \theta = \frac{1}{\sqrt{I(\theta)}}\).

Here’s how to implement this in R…

fisher.info<-solve(out$hessian)
sig<-sqrt(diag(fisher.info))

## Warning in sqrt(diag(fisher.info)): NaNs produced

upper<-out$par+1.96*sig
lower<-out$par-1.96*sig
interval<-data.frame(out$par,lower=lower,upper=upper)
interval

##         out.par        lower        upper
## 1  1.475387e+04 1.451069e+04 1.499706e+04
## 2  1.488908e+04 1.465377e+04 1.512440e+04
## 3  2.703184e+03 2.527816e+03 2.878552e+03
## 4  1.025222e+03 8.134193e+02 1.237025e+03
## 5  2.297120e+03 2.087531e+03 2.506708e+03
## 6  1.065165e+04 1.044572e+04 1.085759e+04
## 7  4.971002e+03 4.841347e+03 5.100657e+03
## 8  8.939399e+03 8.831557e+03 9.047240e+03
## 9  1.445472e+04 1.424579e+04 1.466365e+04
## 10 2.144821e-02 1.990904e-02 2.298738e-02
## 11 8.814388e+03          NaN          NaN

Bayesian Analysis

Let’s go Bayesian. It’ll make quantification of prediction error easier than the pure maximum likelihood method.

First thing to do is to write a JAGS file that we’ll call subsequently to fit the model.

Mod<-function(){
  #Likelihood
  for (t in 1:T){ 
    Nhat[t]~dnorm(round(N[t]),tau[t])
    C[t]~dbin(p[t],round(N[t]))
    p[t]~dbeta(a,b)
  }
  
  #Prediction
  Npred<-Cpred/qbeta(0.5,a,b)
  
  #Priors
  a~dunif(0,2000)
  b~dunif(0,20000)
  
  for (t in 1:T){
    N[t]~dunif(100,20000)
  }
}

This should look pretty similar to the data-generating model. It’s pretty much the same thing, just “going in reverse.”

The great thing with the Markov Chain Monte Carlo (MCMC) method we’ll use is that we can make it sample over all unknowns. This includes our prediction. So in the middle of the JAGS file there is the line “Npred<-Cpred/qbeta(0.5,a,b)”. “Cpred” is any \(C\) we want to use for a future prediction of \(N\). Subsequently, we will use \(C\) = 50, 100, 150, 200. We’re going to use the point estimate of \(p\) for the prediction, hence the 50th quantile of the Beta distribution. But there is uncertainty in the best value of the 50th quantile of \(p\) that is represented by \(a\) and \(b\). This uncertainty will be wrapped into our prediction, Npred. Awesome, becuase the hard work analytical work is done for us by the MCMC!

Got some diffuse priors in there. It took some trial and error before I realize how large the upper bounds of the priors on \(a\) and \(b\) need to be. More to come on that…

The stuff below just repackages the data so it can be fed into JAGS, sets up initial starting points, and specifies some conditions of how many MCMC samples to take. Notice that we will use the empirical data to generate predictions for \(N\) in a future cenario where C = 50, 100, 150, 200.

#Initialize MCMC chains
Init.fun<-function(){
  list("a"=runif(1,20,40),"b"=runif(1,1000,2000),"N"=Nhat)
}

# List the parameters to save as output
params=as.character(c("a","b","N","Npred"))

#Repackage empirical data
D<-list(Nhat=Nhat,C=C,tau=1/sdNhat,T=length(Nhat),Cpred=c(50,100,150,200))

#Call JAGS into a parallel process
library(R2jags)

## Loading required package: rjags

## Loading required package: coda

## Linked to JAGS 4.2.0

## Loaded modules: basemod,bugs

## 
## Attaching package: 'R2jags'

## The following object is masked from 'package:coda':
## 
##     traceplot

jags.out<-jags.parallel(data=D, inits=Init.fun, parameters.to.save=params, model.file = Mod, 
                        n.chains = 4,n.iter = 100000, n.burnin = 50000, n.thin = 13,
                        n.cluster= 4, DIC = TRUE,
                        working.directory = NULL, jags.seed = 123, digits=5,
                        jags.module = c("dic"))

This model fits really fast compared to most. Let’s check of the model diagnostics to make sure that everyting worked OK.

#traceplot(jags.out)

The trace plots look great except for \(a\) and \(b\). I think the chains had some difficulty converging for those paramters. Let’s look at the output, including the Gelman-Rubin diagnositc “Rhat”.

jags.out$BUGSoutput$summary

##                 mean          sd        2.5%         25%         50%
## N[1]     14334.99000   11.134297 14313.27860 14327.43992 14335.06216
## N[2]     15891.02549   10.927245 15869.94910 15883.53435 15891.04665
## N[3]      2699.72342    9.461943  2681.08576  2693.41149  2699.67465
## N[4]      1218.15379   10.328818  1197.89501  1211.19418  1218.08402
## N[5]      2213.47373   10.368390  2193.19359  2206.49838  2213.45530
## N[6]     10984.92299   10.261010 10964.59590 10978.08537 10984.90174
## N[7]      4985.10455    8.086714  4969.08066  4979.59756  4985.21236
## N[8]      8737.99349    7.375209  8723.55039  8732.94878  8738.02624
## N[9]     13877.85854   10.251964 13857.72775 13870.85709 13877.91518
## Npred[1]  2330.00497  156.552135  2033.50364  2229.66546  2325.44504
## Npred[2]  4660.00993  313.104270  4067.00728  4459.33093  4650.89007
## Npred[3]  6990.01490  469.656404  6100.51092  6688.99639  6976.33511
## Npred[4]  9320.01987  626.208539  8134.01456  8918.66185  9301.78014
## a           47.87672   33.824238    11.79644    25.56129    38.40141
## b         2166.61597 1546.995686   521.05019  1145.67770  1728.93234
## deviance   138.78767    7.139184   127.33667   133.65798   137.93076
##                  75%      97.5%     Rhat n.eff
## N[1]     14342.50348 14356.6095 1.001002 15000
## N[2]     15898.43059 15912.4434 1.001057 13000
## N[3]      2706.07556  2718.3394 1.001113 10000
## N[4]      1225.18214  1238.2662 1.000964 15000
## N[5]      2220.51934  2233.7448 1.000987 15000
## N[6]     10991.80006 11004.7619 1.000893 15000
## N[7]      4990.62048  5000.6985 1.000915 15000
## N[8]      8743.01409  8752.2573 1.000994 15000
## N[9]     13884.80359 13897.7649 1.000976 15000
## Npred[1]  2424.39456  2654.2538 1.000937 15000
## Npred[2]  4848.78912  5308.5077 1.000937 15000
## Npred[3]  7273.18368  7962.7615 1.000937 15000
## Npred[4]  9697.57824 10617.0154 1.000937 15000
## a           59.21211   144.7215 1.002348  1700
## b         2688.69154  6611.0513 1.002294  1800
## deviance   143.01654   154.8320 1.001061 13000

The Rhats look good. They should be less than 1.1, and even \(a\) and \(b\) easily meet this criteria.

Let’s do some reality check. The expected value of \(p\) and the variance of \(p\) can be computed for the values of \(a\) and \(b\) using some simple theory: \(E(p)=\frac{a}{a+b}\).

(jags.out$BUGSoutput$mean$a/(jags.out$BUGSoutput$mean$a+jags.out$BUGSoutput$mean$b))

## [1] 0.02161972

mean(C/Nhat)

## [1] 0.02214563

The theoretical variance of \(p\) is: \(var(p)=\frac{ab}{(a+b)^2(a+b+1)}\).

ai<-jags.out$BUGSoutput$mean$a
bi<-jags.out$BUGSoutput$mean$b
(ai*bi)/((ai+bi)^2*(ai+bi+1))

## [1] 9.547453e-06

var(C/Nhat)

## [1] 3.091778e-05

The means match up nicely. Not so sure abou the variance. But the tiny values for Var(p) indicate this method is going to work great for prediction.

Let’s use our point estimates of \(a\) and \(b\) to plot our belief in the values of \(p\)

pi<-seq(0,1,0.001)
dpi<-dbeta(pi,ai,bi)
plot(pi,dpi,type='l',ylab='Probability Density','xlab'='p')

This narrow range of \(p\) is why I needed to to use large upper bounds on the priors for \(a\) and \(b\).

The Prediction Interval

Now plot the 95% credible interval on our prediction of \(N\) for the four different levels of \(C\): 50, 100, 150, 200.

plot(Cpred,jags.out$BUGSoutput$summary[10:13,7],type='b',ylim=c(0,11000),ylab='Abundance',main='95% Prediction Interval')
lines(Cpred,jags.out$BUGSoutput$summary[10:13,3],type='b')