--- title: "Probability, log-odds, and odds" author: "WILD 502- Jay Rotella" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` To better understand the connections between the log-odds of an outcome, the odds of an outcome, and the probability of an outcome, it is helpful to work with a range of values on one scale and convert it to the others. It's also helpful to visualize the relationships with some plots. Recall that if the probability of an event is 0.2, that 1. the odds of the event occurring are $$odds = \frac{0.2}{0.8} = 0.25$$ 2. the log-odds of the event occurring are $$ln\Big(\frac{0.2}{0.8}\Big) = -1.3863$$ or $$ ln(0.25) = -1.3863$$ 3. the probability can be reconstructed as $$\frac{odds}{1 + odds} = \frac{0.25}{1.25} = 0.2$$ 4. the probability can also be reconstructed as $$\frac{exp{(ln(odds))}}{1 + exp{(ln(odds))}} = \frac{exp{(-1.3683)}}{1+exp{(-1.3683)}} = \frac{0.25}{1.25}=0.2$$ In R, you can 1. obtain the odds for a given probability by dividing the probability by 1 minus the probability, e.g., `odds = 0.2/(1-0.2)` = `r 0.2/(1-0.2)` 2. obtain the log-odds for a given probability by taking the natural logarithm of the odds, e.g., `log(0.25)` = `r log(0.2/(1-0.2))` or using the `qlogis` function on the probability value, e.g., `qlogis(0.2)` = `r qlogis(0.2)`. 3. obtain the probability from the log-odds using $\frac{exp(x)}{1 + exp(x)}$, where *x* represents the log-odds value either by writing the expression out, e.g., `exp(-1.3862944)/(1 + exp(-1.3862944))`, or by using the `plogis` function, e.g., `plogis(-1.3862944)` = `r plogis(-1.3862944)`. 4. obtain the probability from the odds by using $\frac{odds}{1 + odds}$, e.g., `0.25/1.25` = `r 0.25/1.25`. Probability values range from 0 to 1. It turns out that for $\frac{exp(x)}{1 + exp(x)}$, values of *x* ranging from -5 to +5 create probabilities that range from just above 0 to very close to 1. Values of *x* ranging from -1 to +1 create probabilities that range from about 0.25 to 0.75. The material below will let you explore the relationships for yourself. ```{r} library(ggplot2) log_odds = seq(from = -5, to = 5, by = 0.25) odds = exp(log_odds) # use 'plogis' function to calculate exp(x)/(1 + exp(x)) p = plogis(log_odds) # use odds/(1+odds) to calculate p a different way p2 = odds/(1 + odds) # store probability of failure (1-p) q = 1 - p # store log_odds and y in data frame for use with ggplot d = data.frame(log_odds, odds, p, p2, q) head(d, 4) d[19:23, ] tail(d, 4) ``` Below, we plot the relationship, so you can see the pattern among the values for log-odds and associated probabilities. You might wonder what happens if you get log-odds values that are very very small (e.g., -24, -147, or -2421) or very big (e.g.,14, 250, or 1250). You should use the `plogis` function on such values (no commas in your numbers, e.g., `plogis(-2421)`) to find out for yourself. ```{r, fig.height=3, fig.width=5} ggplot(d, aes(x = log_odds, y = odds)) + geom_line() + scale_x_continuous(breaks = seq(-5, 5, by = 1)) + labs(title = "odds versus log-odds") ``` ```{r, fig.height=3, fig.width=5} ggplot(d, aes(x = odds, y = p)) + geom_line() + labs(title = "probability versus odds") ``` Finally, this is the plot that I think you'll find most useful because in logistic regression your regression equation, e.g., $\hat{\beta_0} + \hat{\beta_0}\cdot{x_1}$ yields the log-odds, and you're interested in how that relates to the probability of survival (or later in the course, the probability of detection or some other probability of interest). ```{r, fig.height=3, fig.width=5} ggplot(d, aes(x = log_odds, y = p)) + geom_line() + geom_hline(aes(yintercept = 0.5), colour = "gray", linetype = "dashed") + geom_vline(aes(xintercept = 0.0), colour = "gray", linetype = "dashed") + scale_x_continuous(breaks = seq(-5, 5, by = 1)) + labs(title = "probability versus log-odds") ```