---
title: "WILD 502: Lab 01"
subtitle: "Black duck known-fate survival analysis"
author: "Jay Rotella, Ecology Dept., Montana State University"
output: html_document
---

In lab this week, we worked with data that were presented in a collapsed and very simple format. Here, we'll work the data in a different format where the information is presented for 1 animal at a time. Here, we'll work through the same models that you worked with directly in MARK. There are 2 things that are noteworthy about this exercise

* I made up the temperature data for the sake of the exercise
* In this data set, like the one that we used for the formal lab assignment, I killed 1 duck in the last week of the study, whereas in the original data set all ducks survived during the last week of the study, i.e., estimated survival rate was 1.0. 

Here, the models are still run in Program MARK, but we will use the RMark interface to construct the models and view output. Once you learn MARK, doing analyses in RMark can save you some time and allow you to work with the estimates in some very useful ways. For example, you can use the many tools in R to graph the results, work with transformations of the estimates, and do a variety of other useful tasks. Working in RMark is not something I'll require you to do for the course, but the material is here for you to explore.

### Bring in the data

```{r, warning=FALSE}
library(RMark)
library(ggplot2)

Blackduck = convert.inp("http://www.montana.edu/rotella/documents/502/ducknoco2.inp",
      group.df = NULL, covariates = NULL)

head(Blackduck)
```

### Process the data & add temperature data

```{r}
# Process the data for use in RMark
bduck.processed = process.data(Blackduck,
  model = "Known",
  begin.time = 0,
  time.intervals = rep(1, 8))
  
# Create default design data
bduck.ddl = make.design.data(bduck.processed)

# Creat TIME variable that matches what we did in MARK using 
#  values of 1, 2, ..., 8 rather that 0, 1, ..., 7, which
#  RMark uses by default
bduck.ddl$S$TIME = bduck.ddl$S$Time + 1

# Add occasion-specific data data for weekly temperature
bduck.ddl$S$temp = c(11, 9, 8, 4, 6, 5, 10, 9.01)

# examine a summary of the design data: 
# note that 'time' is a factor & 'Time' is numeric
summary(bduck.ddl$S)
```


### Build function for creating models

Here, we write a function that establishes which models are to be run in MARK.

```{r}
run.Blackduck = function() {
  #  Define range of models for S
  S.dot = list(formula =  ~ 1)
  # force use of identity matrix for time model by removing intercept
  S.time = list(formula =  ~ -1 + time) 
  S.TIME = list(formula =  ~ TIME)
  S.temp = list(formula =  ~ temp)
  
  # Create model list 
  model.list = create.model.list("Known")

  # NOTE: to avoid having all the output for each model appear when you 
  # call the function, add ', output=FALSE' after 'ddl=bduck.ddl' below. 
  # Here, I don't do that so you can see the output for each model, 
  # but this might not be desired if you have many models.
  bduck.results = mark.wrapper(model.list,
                               data = bduck.processed,
                               ddl = bduck.ddl)
  
  # Return model table and list of models
  return(bduck.results)
}
```

### Run the models

It's worth noting that here, the output from each of the models appears in the console where it can be reviewed. One can also examine model-specific output in other ways as shown below. And, if you don't want to see the output from every model appear in the output here, see the note above for the use of the "mark.wrapper" command where one has the option of suppressing the output.

```{r}
bduck.results = run.Blackduck()
```


### Look at the model-selection table
```{r}
bduck.results
```


### Working with output: Plotting
It's very common to want to plot results from various models of interest. Here, we put the estimates of "S" for each week from the 'constant','time' ,'Time-trend', and 'Temp' models on a single plot. It's pretty easy to do once you learn how to use R & RMark. The code below uses the 'ggplot2' package, which is well worth learning.

```{r fig.width=7, fig.height=6,warning=FALSE}
# create data frames for estimates of real parameters from each model
#  with a row for each week's estimate and a covariate for week
s.d = bduck.results$S.dot$results$real[, 1:4] 
s.dot = cbind(s.d[rep(seq_len(nrow(s.d)), each = 8), ],
              Week = 1:8, 
              Model = "constant")

s.t = cbind(bduck.results$S.time$results$real[, 1:4],
            Week = 1:8, 
            Model = "time-varying")

s.T = cbind(bduck.results$S.TIME$results$real[, 1:4],
            Week = 1:8, 
            Model = "Time trend")

s.temp = cbind(bduck.results$S.temp$results$real[, 1:4],
               Week = 1:8,
               Model = "temperature")

# store all the estimates in a data frame
surv.ests = rbind(s.dot, s.t, s.T, s.temp)

# Make plot
# The errorbars overlapped, so use position_dodge to move them horizontally
pd <- position_dodge(0.4) 

ggplot(surv.ests, aes(x = Week, y = estimate, color = Model)) +
  geom_errorbar(aes(ymin = lcl, ymax = ucl), 
                width = 0.1, 
                position = pd) + 
  geom_line(position = pd) +
  geom_point(position = pd) +
  scale_y_continuous("Weekly Survival Estimates", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1, by = 0.1)) + 
  theme_bw() + # black and white theme
  theme(legend.justification = c(1, 0.1),
          legend.position = c(0.9, 0.1)) # Position legend in lower left
```

We can plot the estimates of weekly survival for the temperature model against either week or temperature. Here, we first predict the survival rates for 8 different temperatures spaced across the range of values observed  and then plot them. The way we do this below is to provide a design matrix with 8 rows and 2 columns as that's what we used in building the model (check out the object 'bduck.results$S.temp$design.matrix' if you want to see it). Here, we present 8 nicely ordered temperature values (4, 5, ..., 10, 11) so we can plot estimated survival against temperature. (*If you wanted to use more than 8 values, you could create time-varying individual covariates that would be the same for all individuals for a given occasion but that would change across occasions. This is explained in detail by Jeff Laake in his RMark workshop notes. The idea is beyond where we are in the course, so we won't go into that further now*).

```{r fig.width=7, fig.height=6,warning=FALSE}
# replace the original temp values with ordered range of 8 values
#  that cover the range of observed values
S.temp.pred = compute.real(bduck.results$S.temp,
                           design = matrix(c(rep(1, 8), 4:11), 
                                           8, 2, byrow = F))
S.temp.pred$Temperature = 4:11

ggplot(S.temp.pred, aes(x = Temperature, y = estimate)) +
  geom_errorbar(aes(ymin = lcl, ymax = ucl), 
                width = 0.1) + 
  geom_point() +
  scale_y_continuous("Weekly Survival Estimates", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1, by = 0.1)) + 
  theme_bw() 
```

We can also use RMark's model-averaging function to obtain estimates of weekly survival rates that are weighted averages of each model's estimates where the weights are the AICc weights. For this study, consider what the estimates of weekly survival are for each model. Next, consider what the AICc weights are for each model. Finally, calculate a weighted average using that information.  The model weights sum to 1.0, so the weighted average is simply the sum of each model's estimate of a given week's survival rate multiplied by the weight of support for that model, i.e., $\hat{\bar{S_{i}}}=\sum_{i=1}^{n}\hat{S_{i}}*w_{i}$. Here, the model-specific estimates of survival in week 1 were:(1) `r round(s.dot$estimate[1],3)` for the *S(.)* model; (2) `r round(s.t$estimate[1],3)` for the *S(time)* model; (3) `r round(s.T$estimate[1],3)` for the *S(Time-trend)* model; and (4) `r round(s.temp$estimate[1],3)` for the *S(temp)* model. The corresponding weights were `r round(bduck.results$model.table$weight[bduck.results$model.table$model=="S(~1)"],3)`, `r round(bduck.results$model.table$weight[bduck.results$model.table$model=="S(~time)"],3)`, `r round(bduck.results$model.table$weight[bduck.results$model.table$model=="S(~Time)"],3)`, and `r round(bduck.results$model.table$weight[bduck.results$model.table$model=="S(~temp)"],3)`. 

The weighted average takes into account (1) the model-selection uncertainty as well as (2) the model-specific uncertainty in the estimates. For this dataset and model set, estimates from the *S(temp)* model are heavily weighted, estimates from the *S(time)* model receive almost no weight, and estimates from the *S(.)* model and the *S(Time-trend)* model receive modest amounts of weight.

```{r}
S.weekly.averages=model.average(bduck.results,vcv=TRUE)
s.mod.avg = S.weekly.averages$estimates[, 2:5]
s.mod.avg$Week = 1:8
s.mod.avg$Model = "Model-avg"

ggplot(s.mod.avg, aes(x = Week, y = estimate)) + 
  geom_errorbar(aes(ymin = lcl, ymax = ucl), 
                width = 0.1) + 
  geom_point() +
  scale_y_continuous("Model-Averaged Weekly Survival Estimates", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1, by = 0.1)) + 
  theme_bw() 
```

It can be useful to plot the model-specific estimates along with the model-averaged estimates. Here, the model-averaged estimates are added to each week's set of estimates from the 4 competing models. You can see that the model-averaged estimates are all quite close to the estimates from the heavily weighted temperature model but pulled a bit towards the estimates from the constant and the time-trend model. You can also see that the confidence intervals for the model-averaged estimate are a bit wider than those for the temperature model.

```{r}
surv.ests.all = rbind(s.dot, s.t, s.T, s.temp, s.mod.avg)

ggplot(surv.ests.all, aes(x = Week, y = estimate, color = Model)) +
  geom_errorbar(aes(ymin = lcl, ymax = ucl), 
                width = 0.1, 
                position = pd) + 
  geom_line(position = pd) +
  geom_point(position = pd) +
  scale_y_continuous("Weekly Survival Estimates", 
                     limits = c(0, 1), 
                     breaks = seq(0, 1, by = 0.1)) + 
  theme_bw() + # black and white theme
  theme(legend.justification = c(1, 0.1),
          legend.position = c(0.9, 0.1))

# if you're running this code, it's wise to cleanup files
# you can use the next 2 lines without comments to do that
#rm(list=ls(all=TRUE))
#cleanup(ask = FALSE)
```