---
title: "Confidence Intervals on Transformed Quantities"
author: "WILD 502 - Jay Rotella"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Expressing Uncertainty on Estimated Probabilities
After you look at $\hat\beta's$ and their associated $\widehat{SE}(\hat\beta)'s$ results of a logistic regression , you are typically also interested in generating estimates of survival. To do so, you have learned to back-transform the regression from the logit scale to the probability scale using $\frac{exp(\hat\beta_0 + \hat\beta_1\cdot{x_1}...)}{1+exp(\hat\beta_0 + \hat\beta_1\cdot{x_1}...)}$. You also need to have a way of obtaining estimates of the SE's for $\hat{S}$. There are a variety of ways of doing so, but for now, we'll discuss the *delta method*, which is dicussed in detail in [Appendix B of the Cooch and White on-line book](http://www.phidot.org/software/mark/docs/book/pdf/app_2.pdf).
The key idea is that you work with (1) the $\hat\beta's$ and the variance-covariance matrix for the $\hat\beta's$, (2) the transformation being used to convert the $\hat\beta's$ to $\hat{S}$, and (3) a set of partial derivatives that indicate how much $\hat{S}$ changes as each of the $\hat\beta's$ changes, i.e., (how sensitive is the outcome $\hat{S}$ to the uncertainties about the $\hat\beta's$).
## A Simple Example
For the *S(constant)* model on the fawns data, you estimated $\hat\beta_0$ as -0.3545 and $\widehat{SE}(\hat\beta_0)$ as 0.1903. The transformation is $\frac{exp(\hat\beta_0)}{1+exp(\hat\beta_0)}$. You can read about the derivatives in Appendix B of the Cooch and White book, which is beyond the level of detail that we'll get into at this point in the course. Fortunately, we can use the *deltavar* function of the *emdbook* package to do the calculations quite easily as it calculates the derivatives and does the required matrix math for us.
```{r}
library(knitr) # for printing nice tables
library(emdbook) # for deltavar function
library(ggplot2) # for plotting
b0 <- -0.354545
Beta.hats <- c(b0 = b0)
se_b0 <- 0.1902681
# Calculate ln odds of Survival rate
ln_odds_S <- b0
# Estimate the 95% confidence limits for ln odds of S
se_ln_odds_S <- se_b0 # simple for this problem with only 1 beta_hat
lcl_ln_odds_S <- ln_odds_S - 1.96 * se_ln_odds_S
ucl_ln_odds_S <- ln_odds_S + 1.96 * se_ln_odds_S
# backtransform log-odds to obtain estimate of S
S <- plogis(ln_odds_S)
# Estimate SE for S with delta method
# create var-cov matrix (this one's simply a 1 x 1 matrix)
var_Beta = matrix(se_b0^2, nrow = 1, ncol = 1)
se_S <- sqrt(deltavar(exp(b0)/(1 + exp(b0)),
meanval = Beta.hats,
Sigma = var_Beta))
cbind(S, se_S)
```
You can obtain 95% confidence limits for $\hat{S}$ as shown below. Notice that here you don't use the $\widehat{SE}(\hat{S})$ to obtain the confidence limits. Rather, you work with confidence limits on the logit scale and back-transform those values, which creates confidence limits for $\hat{S}$ with better properties, e.g., they'll stay within the 0 to 1 bounds.
```{r}
# obtain confidence limits for S_hat by back-transforming
# from the log-odds and confidence limits for log-odds
lcl_S <- plogis(lcl_ln_odds_S)
ucl_S <- plogis(ucl_ln_odds_S)
cbind(S, se_S, lcl_S, ucl_S)
```
## A Slightly More Complex Example
You also estimated the parameters of the *S(length)* model that contains an intercept, $\hat\beta_0$, and a slope, $\hat\beta_1$, which is multiplied by the length covariate. For this model, you need to first use the *delta method* to obtain estimates of the log-odds and associated standard errors for the log-odds of survival for animals of different lengths. Once you have those, you can obtain the confidence bounds on the estimated log-odds.
```{r}
# Store beta_hats
b0 <- -10.23404 # intercept
b1 <- 0.07999 # beta for length
# Store variance-covariance matrix for beta_hats
sigma <- matrix(c(
23.02372737, -0.185786837,
-0.185786837, 0.001501638), nrow = 2, ncol = 2)
# Provide length values over range of data
Length <- seq(from = 108, to = 135.5, by = 0.5)
# Calculate ln odds of Survival rate for any length
ln_odds_S <- b0 + b1 * Length
# Estimate the Standard Errors for ln odds of S
se_ln_odds_S <- sqrt(deltavar(b0 + b1 * Length,
meanval = c(b0 = -10.635836, b1 = 0.0831520),
Sigma = sigma))
# Estimate the 95% confidence limits for ln odds of S
lcl_ln_odds_S <- ln_odds_S - 1.96 * se_ln_odds_S
ucl_ln_odds_S <- ln_odds_S + 1.96 * se_ln_odds_S
# Store all as dataframe
log_odds <- data.frame(Length, ln_odds_S,
se_ln_odds_S,
lcl_ln_odds_S,
ucl_ln_odds_S)
# take a look
kable(head(log_odds, 4), digits = 4)
# Plot estimated relationship between the log-odds and body length.
ggplot(log_odds, aes(x = Length, y = ln_odds_S)) +
geom_line(size = 1.5) +
geom_ribbon(aes(ymin = lcl_ln_odds_S, ymax = ucl_ln_odds_S), alpha = 0.2) +
xlab("Body Length (cm)") +
ylab("Estimated Log-Odds of Survival")
```
Finally, to obtain $\hat{S}$, $\widehat{SE}(\hat{S})$, and confidence limits for $\hat{S}$, you do the following work. Notice that as in the simpler example above that you don't use the $\widehat{SE}(\hat{S})$ to obtain the confidence limits. Rather, you work with confidence limits on the logit scale and back-transform those values, which creates confidence limits for $\hat{S}$ with better properties, e.g., they'll stay within the 0 to 1 bounds.
```{r}
# back transform the log-odds to obtain estimates of S
S <- plogis(ln_odds_S)
lcl_S <- plogis(lcl_ln_odds_S)
ucl_S <- plogis(ucl_ln_odds_S)
# Estimate the Standard Errors for S
se_S <- sqrt(deltavar(
exp(b0 + b1 * Length) / (1 + exp(b0 + b1 * Length)),
meanval = c(b0 = b0, b1 = b1),
Sigma = sigma))
# Store all as a data.frame
surv_ests <- data.frame(Length, S, se_S, lcl_S, ucl_S)
# look at a few rows
kable(head(surv_ests, 4), digits = 4)
# Plot estimated relationship between survival rate and body length.
ggplot(surv_ests, aes(x = Length, y = S)) +
geom_line(size = 1.5) +
geom_ribbon(aes(ymin = lcl_S, ymax = ucl_S), alpha = 0.2) +
xlab("Body Length (cm)") +
ylab("Estimated Survival Rate")
```