--- title: "Lecture 3 - R code" author: "WILD 502- Jay Rotella" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Likelihood Values and Sample Size First, let's take a look at how the likehood values for various values of *p* vary as we change the sample size but keep the proportion of successes the same. ```{r, message=FALSE} library(tidyverse) # for dplyr and ggplot2 library(binom) # for binomial confidence intervals library(knitr) # for 'kable' function for printing nice table true.p = 0.6 p=seq(from = 0, to = 1, by = 0.01) # make a dataframe with combinations of p and N out <- expand.grid(p = seq(0, 1, 0.01), N = seq(10, 100, 30)) # add y and likelihood values for each value of p out <- out %>% mutate(y = true.p * N, Lp = choose(N, y) * p^(y) * (1-p)^(N-y), lnLik = log(Lp)) # plot likelihoods for each value of N ggplot(out, aes(x = p, y = Lp, color = factor(N))) + geom_line() + geom_vline(xintercept = true.p) ``` If we calculate confidence intervals based on the likelihood profile, we can see how the uncertainty in $\hat{p}$ changes as sample size changes. ```{r} CIs = binom.profile(x = 0.6 * seq(10, 100, 30), n = seq(10, 100, 30), conf.level = 0.95) kable(CIs) ``` As explained in [Chapter 1 of CW](http://www.phidot.org/software/mark/docs/book/pdf/chap1.pdf)(*see page 1-22*), "... for a single parameter, likelihood theory shows that the 2 points 1.92 units down from the maximum of the log likelihood function provide a 95% confidence interval when there is no extra-binomial variation ...". The graphic below shows this for the scenario where N = 40 and y = 24. ```{r} out40 <- filter(out, N == 40, between(p, 0.3, 0.8)) max_ln_lik <- max(out40$lnLik) ggplot(out40, aes(x = p, y = lnLik, color = factor(N))) + geom_line() + geom_vline(xintercept = true.p) + geom_hline(yintercept = max_ln_lik - 1.92, linetype = "dashed") + geom_vline(xintercept = c(CIs[2, 5], CIs[2, 6]), linetype = "dashed") ``` \pagebreak ## Converting between Probability and Log-odds This simple exercise is intended to give you a better understanding of the connections between the log-odds of an outcome and the probability of an outcome. Recall that if the probability of an event is 0.25, that the value for the log-odds is calculated as $ln(\frac{0.25}{0.75})$. In R, you can use the `qlogis` function to obtain the log-odds for a given probability, e.g., `qlogis(0.25)` = `r qlogis(0.25)`. If you know the log-odds, then you can calculate the probability by using $\frac{exp(x)}{1 + exp(x)}$, where *x* represents the log-odds value. The In R, you can use the `plogis` function to obtain the probability for a given log-odds value, e.g., `plogis(-1.098612)` = `r plogis(-1.0986123)`. Probability values range from 0 to 1. It turns out that for $\frac{exp(x)}{1 + exp(x)}$, values of *x* ranging from -5 to +5 create probabilities that range from just above 0 to very close to 1. Values of *x* ranging from -1 to +1 create probabilities that range from about 0.25 to 0.75. The material below will let you explore the relationships for yourself. ```{r} log_odds = seq(from = -5, to = 5, by = 0.25) # use 'plogis' function to calculate exp(x)/(1 + exp(x)) y = plogis(log_odds) # store log_odds and y in data frame for use with ggplot d = data.frame(log_odds, y) head(d, 4) tail(d, 4) ``` Below, we plot the relationship, so you can see the pattern among the values for log-odds and associated probabilities. You might wonder what happens if you get log-odds values that are very very small (e.g., -24, -147, or -2421) or very big (e.g.,14, 250, or 1250). You should use the `plogis` function on such values (no commas in your numbers, e.g., `plogis(-2421)`) to find out for yourself. ```{r} ggplot(d, aes(x = log_odds, y = y)) + geom_line() + geom_hline(aes(yintercept = 0.5), colour = "gray", linetype = "dashed") + geom_vline(aes(xintercept = 0.0), colour = "gray", linetype = "dashed") + scale_x_continuous(breaks = seq(-5, 5, by = 1)) ```