--- title: "Confidence Intervals on Transformed Quantities" author: "WILD 502 - Jay Rotella" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Expressing Uncertainty on Estimated Probabilities After you look at $\hat\beta's$ and their associated $\widehat{SE}(\hat\beta)'s$ results of a logistic regression , you are typically also interested in generating estimates of survival. To do so, you have learned to back-transform the regression from the logit scale to the probability scale using $\frac{exp(\hat\beta_0 + \hat\beta_1\cdot{x_1}...)}{1+exp(\hat\beta_0 + \hat\beta_1\cdot{x_1}...)}$. You also need to have a way of obtaining estimates of the SE's for $\hat{S}$. There are a variety of ways of doing so, but for now, we'll discuss the *delta method*, which is dicussed in detail in [Appendix B of the Cooch and White on-line book](http://www.phidot.org/software/mark/docs/book/pdf/app_2.pdf). The key idea is that you work with (1) the $\hat\beta's$ and the variance-covariance matrix for the $\hat\beta's$, (2) the transformation being used to convert the $\hat\beta's$ to $\hat{S}$, and (3) a set of partial derivatives that indicate how much $\hat{S}$ changes as each of the $\hat\beta's$ changes, i.e., (how sensitive is the outcome $\hat{S}$ to the uncertainties about the $\hat\beta's$). ## A Simple Example For the *S(constant)* model on the fawns data, you estimated $\hat\beta_0$ as -0.3545 and $\widehat{SE}(\hat\beta_0)$ as 0.1903. The transformation is $\frac{exp(\hat\beta_0)}{1+exp(\hat\beta_0)}$. You can read about the derivatives in Appendix B of the Cooch and White book, which is beyond the level of detail that we'll get into at this point in the course. Fortunately, we can use the *deltavar* function of the *emdbook* package to do the calculations quite easily as it calculates the derivatives and does the required matrix math for us. ```{r} library(knitr) # for printing nice tables library(emdbook) # for deltavar function library(ggplot2) # for plotting b0 <- -0.354545 Beta.hats <- c(b0 = b0) se_b0 <- 0.1902681 # Calculate ln odds of Survival rate ln_odds_S <- b0 # Estimate the 95% confidence limits for ln odds of S se_ln_odds_S <- se_b0 # simple for this problem with only 1 beta_hat lcl_ln_odds_S <- ln_odds_S - 1.96 * se_ln_odds_S ucl_ln_odds_S <- ln_odds_S + 1.96 * se_ln_odds_S # backtransform log-odds to obtain estimate of S S <- plogis(ln_odds_S) # Estimate SE for S with delta method # create var-cov matrix (this one's simply a 1 x 1 matrix) var_Beta = matrix(se_b0^2, nrow = 1, ncol = 1) se_S <- sqrt(deltavar(exp(b0)/(1 + exp(b0)), meanval = Beta.hats, Sigma = var_Beta)) cbind(S, se_S) ``` You can obtain 95% confidence limits for $\hat{S}$ as shown below. Notice that here you don't use the $\widehat{SE}(\hat{S})$ to obtain the confidence limits. Rather, you work with confidence limits on the logit scale and back-transform those values, which creates confidence limits for $\hat{S}$ with better properties, e.g., they'll stay within the 0 to 1 bounds. ```{r} # obtain confidence limits for S_hat by back-transforming # from the log-odds and confidence limits for log-odds lcl_S <- plogis(lcl_ln_odds_S) ucl_S <- plogis(ucl_ln_odds_S) cbind(S, se_S, lcl_S, ucl_S) ``` ## A Slightly More Complex Example You also estimated the parameters of the *S(length)* model that contains an intercept, $\hat\beta_0$, and a slope, $\hat\beta_1$, which is multiplied by the length covariate. For this model, you need to first use the *delta method* to obtain estimates of the log-odds and associated standard errors for the log-odds of survival for animals of different lengths. Once you have those, you can obtain the confidence bounds on the estimated log-odds. ```{r} # Store beta_hats b0 <- -10.23404 # intercept b1 <- 0.07999 # beta for length # Store variance-covariance matrix for beta_hats sigma <- matrix(c( 23.02372737, -0.185786837, -0.185786837, 0.001501638), nrow = 2, ncol = 2) # Provide length values over range of data Length <- seq(from = 108, to = 135.5, by = 0.5) # Calculate ln odds of Survival rate for any length ln_odds_S <- b0 + b1 * Length # Estimate the Standard Errors for ln odds of S se_ln_odds_S <- sqrt(deltavar(b0 + b1 * Length, meanval = c(b0 = -10.635836, b1 = 0.0831520), Sigma = sigma)) # Estimate the 95% confidence limits for ln odds of S lcl_ln_odds_S <- ln_odds_S - 1.96 * se_ln_odds_S ucl_ln_odds_S <- ln_odds_S + 1.96 * se_ln_odds_S # Store all as dataframe log_odds <- data.frame(Length, ln_odds_S, se_ln_odds_S, lcl_ln_odds_S, ucl_ln_odds_S) # take a look kable(head(log_odds, 4), digits = 4) # Plot estimated relationship between the log-odds and body length. ggplot(log_odds, aes(x = Length, y = ln_odds_S)) + geom_line(size = 1.5) + geom_ribbon(aes(ymin = lcl_ln_odds_S, ymax = ucl_ln_odds_S), alpha = 0.2) + xlab("Body Length (cm)") + ylab("Estimated Log-Odds of Survival") ``` Finally, to obtain $\hat{S}$, $\widehat{SE}(\hat{S})$, and confidence limits for $\hat{S}$, you do the following work. Notice that as in the simpler example above that you don't use the $\widehat{SE}(\hat{S})$ to obtain the confidence limits. Rather, you work with confidence limits on the logit scale and back-transform those values, which creates confidence limits for $\hat{S}$ with better properties, e.g., they'll stay within the 0 to 1 bounds. ```{r} # back transform the log-odds to obtain estimates of S S <- plogis(ln_odds_S) lcl_S <- plogis(lcl_ln_odds_S) ucl_S <- plogis(ucl_ln_odds_S) # Estimate the Standard Errors for S se_S <- sqrt(deltavar( exp(b0 + b1 * Length) / (1 + exp(b0 + b1 * Length)), meanval = c(b0 = b0, b1 = b1), Sigma = sigma)) # Store all as a data.frame surv_ests <- data.frame(Length, S, se_S, lcl_S, ucl_S) # look at a few rows kable(head(surv_ests, 4), digits = 4) # Plot estimated relationship between survival rate and body length. ggplot(surv_ests, aes(x = Length, y = S)) + geom_line(size = 1.5) + geom_ribbon(aes(ymin = lcl_S, ymax = ucl_S), alpha = 0.2) + xlab("Body Length (cm)") + ylab("Estimated Survival Rate") ```