How Many Fish Should I Mark?

‘How Many Fish Should I Mark?’

Matt Falcy

November 13, 2017

Problem Setting

Fish are reared in two diffent ways and then stocked into the same lake. We want to know which stock has the best catch rate. The first stock, hereafter “72”, are all adipose fin clipped (batch marked). The second stock, hereafter “53”, are not given any marks. A logistical emergency requires the release of a third stock, hereafter “gradeout”. The gradeout fish can be given a different batch mark e.g. left ventricle clip. How many gradeout fish need to be marked so that we don’t ruin the original comparison between the 72 and 53-origin fish?

We know how many fish of each of the three origins are released into the lake. We will observe three kinds of fish in the creel: adipose-marked fish that are unambiguously 72’s, right-ventricle marked fish that are unambiguosly gradeouts, and unmarked fish that are either 53’s or gradeouts. We must use knowledge of the return rate of known gradeouts to infer how many of the unmarked fish are either 53’s or gradeouts. Recall the goal is to compare recapture rate of 72’s and 53’s, but we never know unambiguously when we recpture a 53!

We know how many 72, 53 and gradeout fish are released into the lake.

Variables and Parameters

N1 = number of gradeout fish released = 50000 N2 = number of 53 released = 68000 (100% unmarked) N3 = number of 72 relseased = 63000 (100% with adipose fin clip mark)

p1, p2, and p3 are probabilities of recapture in the creel for gradeouts, 53, adn 72, respectively. omega = proportion of gradeout fish that get left ventricle clip mark. NM1 = number of gradeout fish Not Marked = N1(1-omega). M1 = number of gradeout fish released with left ventrical marks = N1omega.

y1 = number of left ventrical clipped fish recovered in the creel (unambiguously gradeouts). y3 = number of adipose clipped fish recovered in the creel (unambiguously stock 72). y_dot = number of unmarked fish recovered in the creel. These are either 53s or unmarked gradeouts.

The Model

Use an ordinary binomial model for counts of adipose marked fish: y3~Bino(N3,p3)

Use an ordinary binomial model for counts of gradeot fish that are marked: y1~Bino(M1, p1)

We will leverage information about the value of p1 from the line above.

Regard the observation of unmarked fish, y_dot, as a mixture of two Bernoulli random variables. The degree of mixing among the two Bernoullis is a parameter that needs to be estimated.

Data Simulation

Let’s simulate data with known parameters and verify that the model can recover them. We have some preliminary data suggesting that recovery rate of 72 is at least 100/107964. We’ll go 10% above and below that for 53’s and gradeouts, respectively.

N1<-50000 #Gradeout fish  
phi<-0.2 #proportion of gradeouts that get LV mark
M1<-floor(N1*phi) #number of LV marks released
p1<-0.0009262347*0.9 #recovery rate of gradeout fish:  
y1<-rbinom(1,M1,p1) #random draw of fish with LV mark.  Observed data.
NM1<-floor(N1-M1) # number of gradeout fish not marked
z1<-rbinom(1,NM1,p1)#random draw of gradeout fish NOT LV marked

N2<-68000 #Stock 53 fish released without marks
p2<-0.0009262347*1.1 #Recovery rate of Stock 53 fish
z2<-rbinom(1,N2,p2)#Random draw of stock 53 recovered

y_dot<-z1+z2#Recovered fish without marks. Observed data.  

N3<-63000 #Stock 72 fish, 100% with Ad clips
p3<-0.0009262347 #recovery rate of Stock 72 fish
y3<-rbinom(1,N3,p3) #Random draw of stock 72 fish recovered

A JAGS Model of the Simulated Data

Here’s a JAGS model of this system. Note that z for the ith fish without a mark is a latent indicator of stock ID. If z_i = 0, then we use p1, which is already given in the model of y1. If z_i = 1, then we use p2.

I’m going to give this informative priors. Previous attempts to fit this model using flat priors (viz. p[2]~dbeta(1,1)) resulted in slightly unacceptable convergence diagnostics (Rhat = 1.12). For p2 I’ll set the beta mean, \(\mu\), to 0.0009262347 to match preliminary results. I’ll arbitrarily choose a standard deviaiton \(\sigma\), of 0.03. While this standard deviation may seem very low (highly informative) it is important to note that the standard deviation must be evaluated relative to the mean. Indeed, increasing the standard deviation (\(\sigma=0.05\)) is not definded for a beta distrbution with \(\mu=0.0009262347\)

We can compute the \(a\) and \(b\) parameters of the beta distrution with:

\[a=\mu(\frac{\mu(1-\mu)}{\sigma^2}-1)\] \[b=(1-\mu)\frac{\mu(1-\mu)}{\sigma^2}-1\]

mu<-0.0009262347
s<-0.03
ai<-mu*((mu*(1-mu))/s^2-1)
bi<-(1-mu)*((mu*(1-mu))/s^2-1)
xi<-seq(0,0.2,by=0.001)
yi<-dbeta(xi,ai,bi)
plot(xi,yi,xlab='p',ylab='Density',main='Prior on p2',type='l')

mod2<-function(){
  y1~dbin(p[1],M1)
  y3~dbin(p[3],N3)
  
  for (i in 1:length(y_dot_bern)){
    z[i]~dbern(omega)
    y_dot_bern[i]~dbern(p[1+z[i]]) 
  }

  #priors
  p[1]~dbeta(1,1)
  p[2]~dbeta(1,1)
  p[3]~dbeta(1,1)
  omega~dbeta(1,1)
}#Close model function

Preliminaries and Execution

The first line below takes the count, y_dot, and converts it to Bernoulli’s (zeros and ones). We need to do this so that each unmarked fish can be assigned a mixing probability z_i~Bernoulli(omega) in the model above.

Running JAGS in a parallel (jags.parallel) process will speed this up. The Bernoulli format of the data is computationally expensive.

y_dot_bern<-c(rep(1,y_dot),rep(0,N2+NM1-y_dot))
data2<-list(y1=y1,y_dot_bern=y_dot_bern,y3=y3,M1=M1,N3=N3)

Nchains<-3#Number of Markov chain Monte Carlo simulations

inits<-function(){
  list("p"=runif(3,0.0005,0.002),"omega"=0.5)
}
params=c("p")
library(R2jags)
start_time <- Sys.time()
#out<-jags.parallel(model.file=mod2,data=data2,inits=inits,parameters.to.save=params,n.chains=Nchains,n.iter=20000,n.burnin=3000,n.thin=17)
end_time<-Sys.time()
end_time-start_time

## Time difference of 0 secs

MCMC Diagnostics and Output

#traceplot(out)
#autocorr.plot(out2)

#[iter,chain,param], params are deviance, p1, p2, p3
plot(out$BUGSoutput$sims.array[,1,3],type='l',col='red')
lines(out$BUGSoutput$sims.array[,2,3],col='blue')
lines(out$BUGSoutput$sims.array[,3,3],col='green')

out2<-as.mcmc(out)
gelman.diag(out2)

## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## deviance       1.19       1.69
## p[1]           1.07       1.22
## p[2]           1.17       1.35
## p[3]           1.00       1.00
## 
## Multivariate psrf
## 
## 1.24

geweke.diag(out2)

## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## deviance     p[1]     p[2]     p[3] 
##   -7.657   -2.638   -4.294    1.397 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## deviance     p[1]     p[2]     p[3] 
##  -0.8134   0.2140  -0.8119  -1.2896 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## deviance     p[1]     p[2]     p[3] 
##   -2.975    4.696   -4.860   -1.169

out$BUGSoutput$summary

##                  mean           sd         2.5%          25%          50%
## deviance 1.652608e+03 2.144729e+01 1.587053e+03 1.650031e+03 1.661355e+03
## p[1]     1.149319e-03 3.149441e-04 6.029590e-04 9.288112e-04 1.124010e-03
## p[2]     8.518466e-04 3.800014e-04 1.192806e-04 6.287745e-04 8.724843e-04
## p[3]     9.527241e-04 1.223164e-04 7.265235e-04 8.682154e-04 9.452181e-04
##                   75%        97.5%     Rhat n.eff
## deviance 1.664289e+03 1.670090e+03 1.096611    34
## p[1]     1.345650e-03 1.843850e-03 1.024093    93
## p[2]     1.045801e-03 1.673357e-03 1.141784    53
## p[3]     1.033241e-03 1.211069e-03 1.000807  5500