---
title: "Lecture 02 - R code"
author: "WILD 502 - Jay Rotella"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Normal Distribution
```{r}
x = rnorm(5000, mean = 50, sd = 25)
hist(x, prob = TRUE, xlim = c(-70, 170))
lines(density(x), col = 'blue')
```
## Example of comparing 2 means
```{r}
mu1 = 12.5
mu2 = 14
sig1 = 3
sig2 = 5
x1 = rnorm(25, mu1, sig1)
x2 = rnorm(25, mu2, sig2)
boxplot(x1, x2, ylim = c(0, 30))
t.test(x1, x2, var.equal = TRUE)
```
## Work with variance, covariance, and correlation
```{r}
# calculation of variance
sum( (x1 - mean(x1)) * (x1 - mean(x1)) ) / (length(x1) - 1)
# compare with output from "var" function
var(x1)
# calculation of covariance
# notice similarity to variance calculation
sum( (x1 - mean(x1)) * (x2 - mean(x2)) ) / (length(x1) - 1)
# compare with output from "cov" function
cov(x1, x2)
# calculate correlation
cov(x1, x2) / (sd(x1) * sd(x2))
# compare with output from "cor" function
cor(x1, x2)
# variance-covariance and correlation matrices
cov(cbind(x1, x2))
cor(cbind(x1, x2))
```
## Example of different hypothesis testing methods
Next, we'll work with a survival analysis of data from deer fawns, which you'll work with soon in lab. The data contain survival information (status = live or dead) for 2 different areas for male and female fawns whose body mass and length were measured.
```{r}
# read in the data
fawns <- read.table(file = "http://www.montana.edu/rotella/documents/502/FawnsData.txt",
sep = "\t", header = TRUE)
# create status variable where 1 = lived and 0 = died
fawns$status <- NA
# set status to 1 for those that lived, i.e.,had 10 history
fawns$status[which(fawns$eh == 10)] <- 1
# set status to 0 for those that died, i.e.,had 11 history
fawns$status[which(fawns$eh == 11)] <- 0
# treat area, sex and status as factors and set labels
fawns$area = factor(fawns$area,
levels = c(0, 1),
labels = c("control","treatment"))
fawns$sex = factor(fawns$sex,
levels = c(0, 1),
labels = c("female","male"))
fawns$status = factor(fawns$status,
levels = c(0, 1),
labels = c("died","lived"))
# examine the data and a few simple summaries
head(fawns, 4)
summary(fawns)
# look at some basic summary statistics
table(fawns$sex, fawns$status)
# work with proportions and have row values sum to 1
prop.table(table(fawns$sex, fawns$status),
margin = 1)
prop.table(table(fawns$area, fawns$status),
margin = 1)
```
### Run some competing models
```{r}
m0 <- glm(formula = status ~ 1,
family = binomial(link = logit), data = fawns)
ms <- glm(formula = status ~ sex,
family = binomial(link = logit), data = fawns)
ml <- glm(formula = status ~ length,
family = binomial(link = logit), data = fawns)
msl <- glm(formula = status ~ sex + length,
family = binomial(link = logit), data = fawns)
prop.table(table(fawns$status))
# check if predictions are for the proportion that lived or died
# it's important to be certain of which outcome is being predicted
# you can look at the output from the null model to check
mean(predict(m0, type = "response"))
```
### Make pairwise comparisons using likelihood-ratio tests
```{r}
anova(m0, ms, test = "Chisq")
anova(m0, ml, test = "Chisq")
anova(m0, msl, test = "Chisq")
anova(ms, ml, test = "Chisq") # Invalid!
anova(ms, msl, test = "Chisq")
anova(ml, msl, test = "Chisq")
```
### Use Information-theoretic method for model comparisons
For some of you the use of Akaike's Information Criterion (AIC) will be new. For now, we're just going to expose you to the ideas that it's a useful way to compare models of the *same* response variable fit to the *same* number of observations. It's especially useful for comparing non-nested models (e.g., models **ms** and **ml**). We'll see other examples of non-nested models where you have competing models that each use a different functional form of a covariate of interest (e.g., age, ln(age), age + $age^2$). Also, AICc is a version of AIC that's adjusted for sample size (more on that in the days ahead). Finally, AIC is not the only information criterion available in data analysis. You might also see others such as BIC, SIC, DIC, and WAIC in the literature. We'll focus on using AICc in this course, but we will also discuss the others and why they are of interest and used in some settings.
```{r}
# need to have the package 'AICcmodavg' installed for this to work
# if don't have the package, can use the following line w/o the '#'
# install.packages("AICcmodavg")
library(AICcmodavg)
candidates = list(m0, ms, ml, msl)
model.names = c("intercept", "sex", "length", "sex+length")
# Be sure that the order of model names matches
# the order used for the candidate models.
# Check out the help for 'aictab' for another way to
# implement this function.
aictab(cand.set = candidates, modnames = model.names, sort = TRUE)
```
### Examine output for best-supported model
There's a lot more one would do in the process of making inferences from these analyses, e.g., goodness-of-fit, diagnostics, etc. However, for now we'll just look at the coefficient estimates and their confidence intervals.
```{r}
# look at output for best-supported model
summary(msl)
```