SpatialDistribution

Coastal Chinook Spawing Distribution: quantifying irregularities in real time

Background

The objective of this work is to quantify irregulatities in the spawining distribution of Chinook within a given basin. This information could be compiled on a weekly basis, providing managers with nearly real-time information on whehter the observed spawning distirubiton is unusual relative to our historical data.

Data

The raw data are counts of spawners at index sites. These data exist at mulitiple sites per basin. The temporal component of the data includes counts at approximately weekly intervals during a run from 1998 through 2018.

Example: Tillamook

Load raw counts on week 44 in the Tillamook:

cd('\\Kalawatseti\home\falcym\docs\Projects\CoastalMultispeciesPlan\SlidingScale\EmergencyClosure')
X1=readtable('WK44_All.xlsx','Sheet','Tillamook')
X1 = 21×7 table 
 YearS1S2S3S4S5S6
1199815NaN0000
219991400000
32000NaNNaN00NaNNaN
420011501030
52002NaNNaN0000
620033920100
7200419NaN83500
82005NaN00000
920061200000
102007NaNNaN02NaNNaN
1120081013400
122009NaN00303
13201024NaNNaN24NaN3
142011NaN01006
152012NaNNaN0NaN11NaN
1620133906001
172014NaN43500
182015NaN02200
192016NaN4117172
202017NaN151820
212018NaNNaN1710

	Year	S1	S2	S3	S4	S5	S6
1	1998	15	NaN	0	0	0	0
2	1999	14	0	0	0	0	0
3	2000	NaN	NaN	0	0	NaN	NaN
4	2001	15	0	1	0	3	0
5	2002	NaN	NaN	0	0	0	0
6	2003	39	2	0	1	0	0
7	2004	19	NaN	8	35	0	0
8	2005	NaN	0	0	0	0	0
9	2006	12	0	0	0	0	0
10	2007	NaN	NaN	0	2	NaN	NaN
11	2008	1	0	1	34	0	0
12	2009	NaN	0	0	3	0	3
13	2010	24	NaN	NaN	24	NaN	3
14	2011	NaN	0	1	0	0	6
15	2012	NaN	NaN	0	NaN	11	NaN
16	2013	39	0	6	0	0	1
17	2014	NaN	4	3	5	0	0
18	2015	NaN	0	2	2	0	0
19	2016	NaN	4	1	17	17	2
20	2017	NaN	1	5	18	2	0
21	2018	NaN	NaN	1	7	1	0

We wish to know whether the distribution of counts across six sites (S1, S2,...,S6) observed on week 44 in 2018 is unusual relative to the distibution at week 44 observed since 1998. Since this analysis is not concerned with abudnance, I will examine how the total count within a year is apportioned across sites.

First, I will impute all missing values by weighting several simple linear regressions among sites for which data exist. The function for doing this is given in the Appendix of this docuemnt.

X2=array2table(Impute(X1{:,:}));
X2.Properties.VariableNames=X1.Properties.VariableNames;
X2
X2 = 21×7 table 
 YearS1S2S3S4S5S6
11998150.254390000
219991400000
3200018.6870.099634000.279010.38702
420011501030
5200218.950.420740000
620033920100
72004190.4723383500
8200518.01200000
920061200000
10200720.1960.65738021.46180.7633
1120081013400
12200919.72800303
132010240.676362.8535242.21363
14201121.66601006
15201219.5672.146304.7952111.1188
1620133906001
17201439.70343500
18201519.26802200
19201639.444117172
20201724.373151820
21201821.5581.03211710

	Year	S1	S2	S3	S4	S5	S6
1	1998	15	0.25439	0	0	0	0
2	1999	14	0	0	0	0	0
3	2000	18.687	0.099634	0	0	0.27901	0.38702
4	2001	15	0	1	0	3	0
5	2002	18.95	0.42074	0	0	0	0
6	2003	39	2	0	1	0	0
7	2004	19	0.47233	8	35	0	0
8	2005	18.012	0	0	0	0	0
9	2006	12	0	0	0	0	0
10	2007	20.196	0.65738	0	2	1.4618	0.7633
11	2008	1	0	1	34	0	0
12	2009	19.728	0	0	3	0	3
13	2010	24	0.67636	2.8535	24	2.2136	3
14	2011	21.666	0	1	0	0	6
15	2012	19.567	2.1463	0	4.7952	11	1.1188
16	2013	39	0	6	0	0	1
17	2014	39.703	4	3	5	0	0
18	2015	19.268	0	2	2	0	0
19	2016	39.44	4	1	17	17	2
20	2017	24.373	1	5	18	2	0
21	2018	21.558	1.0321	1	7	1	0

Now get proportions of annual totals using the imputed data set:

X2{:,2:end}=X2{:,2:end}./sum(X2{:,2:end},2)
X2 = 21×7 table 
 YearS1S2S3S4S5S6
119980.983320.0166760000
21999100000
320000.960640.0051218000.0143430.019895
420010.7894700.05263200.157890
520020.978280.0217210000
620030.928570.04761900.0238100
720040.304130.00756060.128060.5602500
82005100000
92006100000
1020070.805310.02621300.0797510.0582910.030437
1120080.02777800.0277780.9444400
1220090.76679000.1166100.11661
1320100.422960.011920.0502870.422960.039010.05287
1420110.7558100.034884000.2093
1520120.506560.05556300.124140.284770.028964
1620130.8478300.13043000.021739
1720140.76790.0773650.0580240.09670700
1820150.8280900.0859560.08595600
1920160.49030.0497270.0124320.211340.211340.024863
2020170.483850.0198520.099260.357340.0397040
2120180.682430.032670.0316550.221590.0316550

	Year	S1	S2	S3	S4	S5	S6
1	1998	0.98332	0.016676	0	0	0	0
2	1999	1	0	0	0	0	0
3	2000	0.96064	0.0051218	0	0	0.014343	0.019895
4	2001	0.78947	0	0.052632	0	0.15789	0
5	2002	0.97828	0.021721	0	0	0	0
6	2003	0.92857	0.047619	0	0.02381	0	0
7	2004	0.30413	0.0075606	0.12806	0.56025	0	0
8	2005	1	0	0	0	0	0
9	2006	1	0	0	0	0	0
10	2007	0.80531	0.026213	0	0.079751	0.058291	0.030437
11	2008	0.027778	0	0.027778	0.94444	0	0
12	2009	0.76679	0	0	0.11661	0	0.11661
13	2010	0.42296	0.01192	0.050287	0.42296	0.03901	0.05287
14	2011	0.75581	0	0.034884	0	0	0.2093
15	2012	0.50656	0.055563	0	0.12414	0.28477	0.028964
16	2013	0.84783	0	0.13043	0	0	0.021739
17	2014	0.7679	0.077365	0.058024	0.096707	0	0
18	2015	0.82809	0	0.085956	0.085956	0	0
19	2016	0.4903	0.049727	0.012432	0.21134	0.21134	0.024863
20	2017	0.48385	0.019852	0.09926	0.35734	0.039704	0
21	2018	0.68243	0.03267	0.031655	0.22159	0.031655	0

Quantiles

The proportions in the table above will be used to compute quantiles for the proporiton of total abundance through time. This will be done for each site. The years used for copmuting quantiles are 1998 through 2017 which is n=20 years. Notice that we are coputing quantiles up through but not including the focal year, 2018. The first step in the quantile analysis is to sort observations low to high. The first observation in the sort corresponds with a cumulative probablity of 1/20 = 0.05. The second observation has cumulative probablity 2/20 = 0.1, etc. This is how the horzontial black bars are computed in the figure below.

The quantiles are defined as 0.5/n, 1.5/n, ... (n-0.5)/n, where n is the number of years (n=20). The first quantile is 0.5/20 = 0.025, the second is 1.5/20=0.075, and the last quantile is (20-0.5)/20 = 0.975. From here, a smoothing function is needed to connect the midpoints of the quantiles. This will provide a means of calculating quantiles for new observations (2018). This smoothed function is computed with the following linear interpolation:

The 0.06 quantile is

So the 0.06 quanile is associated with the proportion 0.221, which is greater than the lowest observed proprtion in Site1 (0.0278). This occurs because we are making the function connect quantiles at the midpoint between adjacent proportions. See the figure below.

j=1; %index of site

Xs=sort(X2{1:(end-1),1+j});

for i=1:(length(Xs)-1)

line([Xs(i),Xs(i+1)],[i/length(Xs),i/length(Xs)],'col','k');

line([Xs(i),Xs(i)],[0,i/length(Xs)],'col','k','LineStyle',':')

end

line([Xs(end),1],[1,1],'col','k');

line([Xs(end),Xs(end)],[0,1],'col','k','LineStyle',':')

hold on

p=0:0.01:1;

y=quantile(Xs,p);

plot(y,p,'b')

xlabel('Observation')

ylabel('Quantile')

line([X2{end,1+j},X2{end,1+j}],[0,invprctile(Xs,X2{end,1+j})/100],'col','r')

drawArrow = @(x,y,varargin) quiver( x(1),y(1),x(2)-x(1),y(2)-y(1),0, varargin{:} );

drawArrow([X2{end,1+j},0],[invprctile(Xs,X2{end,1+j})/100,invprctile(Xs,X2{end,1+j})/100],'color','r');

Interpretation for first site within Tillamook

In the figure above, vertical dotted lines are given above each "observed" (in parantheses because includes imputation of missing values) proportion for site S1. See the table immediately above. The horiztal solid black lines connect two observations. The height of this line is the quantile for observations that fall within these two values. The blue line is a linear interpolation designed to connect the midpoints of adjacent solid black lines. Armed with this function, it is possible to compute the quantile associated witht the observation from 2018. This is shown in red in the figure above.

All Sites

The concept illustrated above can now be applied to all counts (observed and imputed) in 2018.

for j = 1:(size(X2,2)-1)
    Xs=sort(X2{1:(end-1),1+j});
    pt(j)=invprctile(Xs,X2{end,1+j})/100;
end
T=array2table([2018 pt]);
T.Properties.VariableNames=X2.Properties.VariableNames
T = 1×7 table 
 YearS1S2S3S4S5S6
120180.310280.790080.602280.778510.710090.575

	Year	S1	S2	S3	S4	S5	S6
1	2018	0.31028	0.79008	0.60228	0.77851	0.71009	0.575

Interpretation for all sites within Tillamook

For site S1, we have already seen in the graph above that the a quantile associated with the 2018 week 44 count is 0.31. Note that this is an imputed value. It would be prudent to ignore the quantiles associated with sites that have no observation. However, imputation is still needed to fill in historic values so that we can compare present proportions to (assumed) past proprotions.

Notice that the quantile for the sixth site, S6, is 0.575. The observed count for that site in 2018 is 0. The computed quantile is 0.575 because about half of the recorded counts from this site at week 44 were 0 fish. This can be appreciated by inspecting the first table in this report.

Roll-up within a basin

The previously stated goal of this work is to quantify irregularity in spawning distribution whith a basin on an on-going basis. To this end, we need to condense multiple quantiles within a basin into a single metric. Averaging raw quantiles won't work because the average of two sites that have unusual distribution (quantiles 0.95 and 0.05) is the same as the average of two sites with a commonly observed distriubtuion (quantiles 0.5 and 0.5). To measure "irregularilty" ,R, at the basin scale we can take the abolute value of the difference between 0.5 and the quantile, and then average over sites:

R will be bounded between 0 and 0.5. R increases as as the proportion of spawners across sites becomes extreme relative to historic distribution.

R=mean(abs(T{1,2:end}-0.5))
R =       0.19095

Benchmarking the basin roll-up

Is the computed value of R extreme? We could answer this by contemplating how it is computed, but it would be easier to interpret it in light of the distribution of R values we get when we apply the logic from the previous section to previous years.

for t=1:(size(X2,1)-1)%don't do 2018

for j = 1:(size(X2,2)-1)%Dont't do year column

X3=X2([1:t-1 t+1:end-1],:);

Xs=sort(X3{:,1+j});

pt2(t,j)=invprctile(Xs,X2{t,1+j})/100;

end

figure

hist(mean(abs(pt2-0.5),2))

hold on

line([R, R],[0, max(ylim(gca))],'col','r')

ylabel('Frequency')

xlabel('Irregulairty Index')

The figure above suggests that computed irregularity index R on 2018 (red line), is normal for this basin.

Multiple basins

Now I just repeat the foregoing for all basins.

names={'Nehalem','Tillamook','Nestucca','Salmon','Siletz','Yaquina','Alsea','Siuslaw'};

figure

for p=1:length(names)

clear X1 X2 Xs pt pt2 R

X1=readtable('WK44_All.xlsx','Sheet',string(names(p)));

X1=X1(sum(isnan(X1{:,2:end}),2)<(size(X1,2)-1),:);%remove years with no obs.

X2=array2table(Impute4(X1{:,:}));

X2.Properties.VariableNames=X1.Properties.VariableNames;

X2{:,2:end}=X2{:,2:end}./sum(X2{:,2:end},2);

for j = 1:(size(X2,2)-1)

Xs=sort(X2{1:(end-1),1+j});

pt(j)=invprctile(Xs,X2{end,1+j})/100;

end

T=array2table([2018 pt]);

T.Properties.VariableNames=X2.Properties.VariableNames;

R=mean(abs(T{1,2:end}-0.5));

for t=1:(size(X2,1)-1)%don't do 2018

for j = 1:(size(X2,2)-1)%Dont't do year column

X3=X2([1:t-1 t+1:end-1],:);

Xs=sort(X3{:,1+j});

pt2(t,j)=invprctile(Xs,X2{t,1+j})/100;

end

subplot(2,4,p)

hist(mean(abs(pt2-0.5),2))

hold on

line([R, R],[0, max(ylim(gca))],'col','r')

ylabel('Frequency')

xlabel('Irregulairty Index')

title(names(p))

end%population

Warning: X is ill conditioned, or the model is overparameterized, and
some coefficients are not identifiable. You should use caution
in making predictions.

The warnings given above pertain to the multiple imputation of performed in the Siuslaw. There isn't an irregularity index (R) for the Nestucca because there isn't enough data for the technique to work:

readtable('WK44_All.xlsx','Sheet','Nestucca')
ans = 21×9 table 
 YearS1S2S3S4S5S6S7S8
11998NaNNaNNaNNaN4NaNNaNNaN
219990NaNNaNNaN0NaNNaNNaN
320000NaNNaNNaN0NaNNaNNaN
420010NaN10NaN53NaNNaNNaN
520020NaNNaNNaN0NaNNaNNaN
620030NaNNaNNaN6NaNNaNNaN
720040NaNNaNNaN39NaNNaNNaN
820050NaNNaNNaN229NaNNaN
920060NaNNaNNaN07NaN0
1020070NaNNaNNaN99NaN2
1120080NaNNaNNaN025NaNNaN
1220090NaNNaNNaN2NaNNaNNaN
132010NaNNaN4NaN11NaNNaNNaN
142011NaNNaNNaNNaN0NaNNaNNaN
152012NaNNaNNaNNaN34NaN4717
1620130NaN31NaN23NaNNaNNaN
17201465NaNNaNNaN54NaNNaN29
1820150252239112
1920163513421267158
20201726NaN1423941NaN21
2120187NaN2000NaN1

	Year	S1	S2	S3	S4	S5	S6	S7	S8
1	1998	NaN	NaN	NaN	NaN	4	NaN	NaN	NaN
2	1999	0	NaN	NaN	NaN	0	NaN	NaN	NaN
3	2000	0	NaN	NaN	NaN	0	NaN	NaN	NaN
4	2001	0	NaN	10	NaN	53	NaN	NaN	NaN
5	2002	0	NaN	NaN	NaN	0	NaN	NaN	NaN
6	2003	0	NaN	NaN	NaN	6	NaN	NaN	NaN
7	2004	0	NaN	NaN	NaN	39	NaN	NaN	NaN
8	2005	0	NaN	NaN	NaN	2	29	NaN	NaN
9	2006	0	NaN	NaN	NaN	0	7	NaN	0
10	2007	0	NaN	NaN	NaN	9	9	NaN	2
11	2008	0	NaN	NaN	NaN	0	25	NaN	NaN
12	2009	0	NaN	NaN	NaN	2	NaN	NaN	NaN
13	2010	NaN	NaN	4	NaN	11	NaN	NaN	NaN
14	2011	NaN	NaN	NaN	NaN	0	NaN	NaN	NaN
15	2012	NaN	NaN	NaN	NaN	34	NaN	47	17
16	2013	0	NaN	31	NaN	23	NaN	NaN	NaN
17	2014	65	NaN	NaN	NaN	54	NaN	NaN	29
18	2015	0	2	5	2	2	39	11	2
19	2016	3	5	13	4	21	26	71	58
20	2017	26	NaN	14	2	39	41	NaN	21
21	2018	7	NaN	2	0	0	0	NaN	1

Appendix: multiple imputation method

This section contains the function for performing multiple imputation. Since different regressions will have different numbers of observations, AIC cannot be used to perform model averaging. Instead, R^2 is used here to weight individual regressions into a single estimate.

function imputed=Impute(PC)
xnew=PC;
isnan(PC)<1;
[row,col] = ind2sub(size(PC),find(isnan(PC))); %indices of missing values
for i = 1:length(row)%missing value
R2s=[];
predy=[];      
%isnan(PC(row(i),:))<1;%ones are useable columns
cs=ind2sub(size(PC,2),find(isnan(PC(row(i),:))<1));%get useable column #, same as below
predcols=find(isnan(PC(row(i),:))<1);%column # of x that contains predictor
xi=PC(:,cs);
yi=PC(:,col(i));
%BEGIN regressions
for j = 1:length(predcols) %univariate regression w/ each survey   
    if sum(~isnan(yi(~isnan(xi(:,j)))))>3 %Make sure there are 4 data points in regression
[b, r, stats] = glmfit(xi(~isnan(xi(:,j)),j),yi(~isnan(xi(:,j))),'Normal','link','identity');
[pred lo hi]=glmval(b,xi(~isnan(xi(:,j)),j),'identity',stats);
%{
%3 ways to compute the log likelihood
LL1=nansum(log(normpdf(yi(~isnan(xi(:,j))),pred,nanstd(stats.resid))));
N=sum(~isnan(yi(~isnan(xi(:,j)))));
sig=nanstd(stats.resid);
LL2=-N*.5*log(2*pi) - N*log(sig) - (1/(2*sig.^2))*nansum(stats.resid.^2);
K_matrix =  eye(N) * sig^2;
LL3 =-N*.5*log(2*pi) - sum(log(diag(chol(K_matrix)))) ...
    - .5*stats.resid(~isnan(stats.resid))' / (K_matrix)*stats.resid(~isnan(stats.resid));
%}
%make prediction
predy(j)=pred(row(i)-sum(isnan(xi(1:row(i),j))));
R2s(j)=1- nansum(stats.resid.^2)/nansum((yi(~isnan(xi(:,j)))-nanmean(yi(~isnan(xi(:,j)))) ).^2);
    else
    predy(j)=NaN;
    R2s(j)=NaN;
    end
%AIC=2*-L_generic+2*3;
end 
%weights1=exp(-0.5*BICs(~isnan(BICs)))./(sum(exp(-0.5*BICs(~isnan(BICs)))));
%weights2=exp(-0.5*devs+k*log(n))./sum(exp(-0.5*devs+k*log(n)));
yhat=sum((predy(~isnan(predy)).*R2s(~isnan(R2s))))/sum(R2s(~isnan(R2s)));
xnew(row(i),col(i))=yhat;
end
imputed=xnew;
end