You have manipulated the Parameter Index Matrices (PIMs) to set up models of interest based on material in Chapter 3 and 4 of CW. Now, you’ll move on to chapter 6 of CW to gain experience with slightly more complex model structures and to gain experience with the logit link.

The connection between PIM numbering and the Design Matrix

When you start a new analysis project in Program MARK, you indicate what type of data you’ll be working with and how many groups you’ll be entering data for. You will have 1 PIM for each parameter type for each group. For live recaptures data, if you enter data for 2 groups (male and female dippers) for a 7-year study, you will have a PIM for \(\phi_{male}\), \(\phi_{female}\), \(p_{male}\), and \(p_{female}\). Each PIM for this data type will be triangular and, by default, initially be numbered as shown below where each column of each PIM has it’s own unique number.

The values in the PIMs establish how many rows are in the design matrix: there will be 1 row for each unique value. With the PIMs numbered as above, you will have 24 rows in your design matrix: rows 1-6 for \(\phi_{male}\), rows 7-12 for \(\phi_{female}\), rows 13-18 for \(p_{male}\), and rows 19-24 for \(p_{female}\). You can now work with the design matrix to setup a linear model for each parameter as you wish. We’ve already worked through the identity matrix and how it can be used to run an intercept-only model for each parameter, which for the PIMs above would yield model \(\phi_{sex \cdot time}, p_{sex \cdot time}\). You can set up other models by altering how many columns are in the design matrix and what is entered into the various cells of the design matrix.

Covariate Types

We will encounter covariates that have a variety of features: (1) categorical vs. continuous, (2) time varying vs. constant through time, (3) shared by many individuals vs. unique to an individual, etc. Here, we’ll begin with categorical variables that are treated as factors. Covariates that indicate which group an individual is in can be in various ways. First, the input file might contain encounter histories for multiple groups such that a PIM for every parameter for the data type will be created for each group as described above and in our previous day’s notes. An example of grouped data for males and females for a 6-year study appear below; the capture history appears first, followed by the frequency of animals in group 1 with that history being presented on that row and then the frequency of animals in group 2 with that history being presented on that row.

11001 14 20 ;

10110 12 43 ;

00010 15 32 ;

Grouping variables can also be entered as individual covariates as discussed in chapter 11 of CW and as you’ll see later in the semester. Continuous covariate values, e.g., body mass, can be entered as individual covariates in the input file. Continuous covariate values that provide information on annual conditions, like our snow pack example from the previous set of notes, can be entered directly into the design matrix as in the example below for a 5-year study where the 1st 4 rows of the design matrix are for \(\phi_1, \phi_2, \ldots, \phi_4\) and the last row is for \(p_.\).

\[ \left[\begin{array}{cc|c} 1 & 125 & 0\\ 1 & 254 & 0\\ 1 & 87 & 0\\ 1 & 180 & 0\\ \hline 0 & 0 & 1 \end{array}\right] \]

Continuous covariates on annual conditions can also be entered as individual covariates - for our snow pack example above in a 5-year study, you’d need to have 4 individual covariates in the input file: sp_yr1, sp_yr2, sp_yr3, and sp_yr4. Chapter 11 of CW presents helpful details on why/when you might prefer doing so: bottom line for now is that the estimates come out the same either way so no need to worry about that for now. If you include individual covariates that take on continuous values in the input file, it’s typical to present data for just 1 animal on a given row because it’s unlikely that multiple animals have the same exact values, especially if there are multiple individual covariates involved. Below, you can see an example for the same scenario as the one above but with body mass data added as a new column and only 1 animal per row.

11001 1 0 18.25 ;

10110 0 1 13.76 ;

00010 0 1 20.42 ;

Review of Basic Linear Models with Categorical Covariates

We’ll use R’s model.matrix function to quickly review some basic linear model structures that are important to understand. To keep things simple, we’ll just work with the portion of the design matrix that’s relevant for \(\phi_i\), we’ll always assume that the PIM numbering has unique numbers in each column (see below), and in the design matrices, the 4 rows for males will have white background while those for females will have a gray background.

\[ {PIM}_{\phi_{male}} = \left[\begin{array}{cc} 1 & 2 & 3 & 4\\ & 2 & 3 & 4\\ & & 3 & 4\\ & & & 4 \end{array}\right]\qquad {PIM}_{\phi_{female}} = \left[\begin{array}{cc} 5 & 6 & 7 & 8\\ & 6 & 7 & 8\\ & & 7 & 8\\ & & & 8 \end{array}\right] \]

Intercept-only model

In this model \(\phi\) is constrained to be the same value regardless of the year or group, i.e., all the data for \(\phi\) are pooled.

(Intercept)
1
1
1
1
1
1
1
1
All years different with no pattern imposed (groups pooled)

Notice that year 1 is the reference value (intercept alone is used for the estimate) and that the values are the same for the male rows (1-4) and the female rows (5-8). In MARK, for time effects in the design matrix, the last year is the default reference value, which you can change under the File menu’s “Preferences” option if you like.

(Intercept) t2 t3 t4
1 0 0 0
1 1 0 0
1 0 1 0
1 0 0 1
1 0 0 0
1 1 0 0
1 0 1 0
1 0 0 1
Groups different (years pooled within each group)

Notice that females are the reference value (intercept alone is used for their estimate) and that all rows for a group are identical (years are pooled). Here, rather than use a variable name like “sex”, I use “male1”, which serves as a reminder as to which group is being coded (indicated) with a 1.

(Intercept) male1
1 1
1 1
1 1
1 1
1 0
1 0
1 0
1 0
Time + Group (additive model)

Here, \(\phi\) is allowed to vary by year and there is a common adjustment for \(\phi_{male}\) across all 4 years. Females in year 1 are the reference group.

(Intercept) t2 t3 t4 male1
1 0 0 0 1
1 1 0 0 1
1 0 1 0 1
1 0 0 1 1
1 0 0 0 0
1 1 0 0 0
1 0 1 0 0
1 0 0 1 0
Time x Group (interaction model)

Here, \(\phi\) is allowed to vary freely by year and to do so differently for each group. This model could be achieved with an 8 x 8 identity matrix (which is what you do if you just work with the PIMs and agree to use the identity matrix). Here, the model is setup in a way that you might have seen in a statistics class in an ANOVA presentation. Females in year 1 are the reference group. Notice that the “t2:male1” is the product of the “t2” column and the “male1” column: the product takes on a value of 1 in the row where “t2” = 1 and “male1” = 1 and takes on a value of 0 in all the others.

(Intercept) t2 t3 t4 male1 t2:male1 t3:male1 t4:male1
1 0 0 0 1 0 0 0
1 1 0 0 1 1 0 0
1 0 1 0 1 0 1 0
1 0 0 1 1 0 0 1
1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 1 0 0 0 0
Trend model, same slope for both groups

Here, \(\phi\) is allowed to vary by year but is constrained to follow a trend that’s related to annual snow pack values: \(\phi_{sp}\) across all 4 years. The intercept value is for a year with 0 snow. The rows for males and females are identical.

(Intercept) sp
1 125
1 254
1 87
1 180
1 125
1 254
1 87
1 180
Trend model (quadratic relationship, same for both groups)

Here, \(\phi\) is allowed to vary by year but is constrained to follow a trend that’s related to annual snow pack values and snow pack values squared, i.e., a quadratic relationship.

(Intercept) sp I(sp^2)
1 125 15625
1 254 64516
1 87 7569
1 180 32400
1 125 15625
1 254 64516
1 87 7569
1 180 32400
Trend + group model (separate intercepts for each group, common slope)

Here, \(\phi\) is allowed to vary by year but is constrained to follow a trend that’s related to annual snow pack values: \(\phi_{sp}\) across all 4 years. The intercept value is for females in a year with a snow pack value of 0. The males have an intercept adjustment. The slope term associated with “sp” is constrained to be the same for males and females.

(Intercept) male1 sp
1 1 125
1 1 254
1 1 87
1 1 180
1 0 125
1 0 254
1 0 87
1 0 180
Trend x group model (separate intercepts and slopes for each group)

Here, \(\phi\) within each group is allowed to vary by year but is constrained to follow a trend that’s related to annual snow pack values. The intercept value is for females in a year with a snow pack value of 0. The males have an intercept adjustment and a slope adjustment such that the 2 groups are allowed to have different intercepts and slopes.

(Intercept) male1 sp male1:sp
1 1 125 125
1 1 254 254
1 1 87 87
1 1 180 180
1 0 125 0
1 0 254 0
1 0 87 0
1 0 180 0

Example with snow pack model for \(\phi\)

In the plots below, \(\hat\phi_t = \frac{exp^{(\hat\beta_1 + \hat\beta_2 \cdot {sp_t})}}{1 + exp^{(\hat\beta_1 + \hat\beta_2 \cdot {sp_t})}}\), where \(\hat\beta_1 = -10\) and \(\hat\beta_2 = 0.05\). Notice the following:

  1. the intercept value is for a snow pack value of 0, which is well below any of the observed values, which range from 87 to 254 (shaded in light blue),

  2. on the log-odds scale the relationship is linear, whereas the relationship between \(\hat\phi\) and snow pack is not (although it’s quite linear over some snow pack values, e.g., 175 to 225),

  3. snow pack values that yield a log-odds value of \(~-5\), yield \(\hat\phi \approx 0\), those that yield a log-odds value of \(0\), yield \(\hat\phi = 0.5\), and those that yield a log-odds value of \(~+5\), yield \(\hat\phi \approx 1\),

  4. Values of \(\hat\phi\) range from 0 to 1 only if you move well beyond the range of observed snow pack values, i.e., in many logistic regression problems, you’ll only see part of the full sigmoid curve when plotting over the range of observed covariate values, which is not a problem, and

  5. You would typically only use the left-hand plots, i.e., plot over the range of observed covariate values, as that’s the inference space.

What you’re learning about using various covariate types brings much useful flexibility to how you can model data from mark-recapture studies. In particular, we started with the CJS model with time-varying real parameters with a likelihood such as the one below.

\[\mathcal{L}(\phi_1, p_2, \phi_2, p3|R_1, {x_\omega})= \frac{R_1!}{\prod\limits_{\omega}x_\omega!}(\phi_1 p_2 \phi_2 p_3)^{x_{111}} (\phi_1 p_2 (1-\phi_2 p_3))^{x_{110}}(\phi_1(1-p_2)\phi_2 p_3)^{x_{101}} (\chi_1)^{x_{100}}\]

You then learned to run models that held either or both of the real parameters constant and to evaluate evidence for which version of the 4 models: (1) \(\phi_t,p_t\), (2) \(\phi_.,p_t\), (3) \(\phi_t,p_.\), and (4) \(\phi_.,p_.\) was best supported by the data.

Now, you can use logistic regression to evaluate support from the data for other models, e.g., \(phi_{snow}, p{sex}\). To do so, you conceive of the linear model for each of the real parameters, number the PIMs accordingly, call up a design matrix with the right number of columns, fill in the appropriate values in the design matrix, and run the model. Repeat on other models and compare the levels of support for the various models. For the model, \(phi_{snow}, p{sex}\), you’d want to achieve the following.

\[\hat\phi_t = \frac{exp^{(\hat\beta_1 + \hat\beta_2 \cdot {sp_t})}}{1 + exp^{(\hat\beta_1 + \hat\beta_2 \cdot {sp_t})}}; \qquad \hat p_{sex} = \frac{exp^{(\hat\beta_3 + \hat\beta_4 \cdot {male1})}}{1 + exp^{(\hat\beta_3 + \hat\beta_4 \cdot {male1})}}\]

For a 5-year study, you could number the PIM for \(\phi_{male}\) as

\[ \begin{matrix} 1 & 2 & 3 & 4\\ & 2 & 3 & 4\\ & & 3 & 4\\ & & & 4\\ \end{matrix} \] The PIM for \(\phi_{female}\) would be

\[ \begin{matrix} 5 & 6 & 7 & 8\\ & 6 & 7 & 8\\ & & 7 & 8\\ & & & 8\\ \end{matrix} \] And, because you have no time variation in the model for \(p\), you could simplify the PIMs such that the males would have

\[ \begin{matrix} 9 & 9 & 9 & 9\\ & 9 & 9 & 9\\ & & 9 & 9\\ & & & 9\\ \end{matrix} \]

and the female PIM for \(p\) would have

\[ \begin{matrix} 10 & 10 & 10 & 10\\ & 10 & 10 & 10\\ & & 10 & 10\\ & & & 10\\ \end{matrix} \]

From there, you’d need a design matrix with 10 rows, where the rows 1-4 are for \(\phi_{t_{male}}\), rows 5-8 are for \(\phi_{t_{female}}\), row 9 is for \(p_{male}\), and row 10 is for \(p_{female}\). There are 4 columns, which means 4 \(\beta_i\) will be estimated. The design matrix would be filled in as follows.

\[ \left[\begin{array}{cc|cc} 1 & 125 & 0 & 0\\ 1 & 254 & 0 & 0\\ 1 & 87 & 0 & 0\\ 1 & 180 & 0 & 0\\ 1 & 125 & 0 & 0\\ 1 & 254 & 0 & 0\\ 1 & 87 & 0 & 0\\ 1 & 180 & 0 & 0\\\hline 0 & 0 & 1 & 1\\ 0 & 0 & 1 & 0 \end{array}\right] \]

The R Notebook for today has you work with the linear models to make sure you see what the relationships look like between the response variable and the covariates in each model. The homework will give you experience implementing the ideas for this week in Program MARK.