Understanding lm() output

Scott Creel
31 Aug 14

Conservation Biology
BIOE 440R & BIOE 521

You must be online to view the equations in this presentation

Ordinary Least Squares (OLS) Regression

Regression is a tool to test for a relationship between continous variables. Starting with a straight-line relationship between two variables:

^Yi=B0+B1Xi

Yi=^Yi+ϵi

Yi=B0+B1Xi+ϵi

OLS Regression

The estimated regression coefficients are

  • y-intercept = B0 or ^β0,
  • slope = B1 or ^β1

for the best fitting line that relates Y to X.

β0 and β1 are the true intercept and slope for the entire population.

B0 and B1 are the estimated intercept and slope from a sample of that population.

OLS Regression

The least squares estimates of the regression coefficients yield

min

That is, with the OLS estimates of \widehat{\beta(0)} and \widehat{\beta(1)} the sum of the squared residuals, \epsilon_{i}^2 is as small as possible.

R Exercise Three extends this basic review of regression models by considering Generalized Linear Models or GLM. The glm() function accomplishes most of the same basic tasks as lm(), but it is more flexible.

With glm(family = gaussian) you will get exactly the same regression coefficients as lm(). This is worth doing at least once, to compare the presentation of output for lm() and glm()

The lm() function assumes that the data are normally distributed and there is a linear 'link' between Y and X. glm() allows for other distributions and links. Three of the most important distributions (and their default link functions) are:

family = gaussian(link = “identity”) - Same as OLS regression. Appropriate for normally distributed dependent variables

family = binomial(link = “logit”) - Appropriate for dependent variables that are binomial such as survival (lived vs died) or occupancy (present vs absent)

family = poisson(link = “log”) - Appropriate for dependent variables that are counts (integers only) such as group size

Brief glm() example to compare with lm()

mod4 <- glm(packsize ~ vegcover, family = poisson)
summary(mod4)

Call:
glm(formula = packsize ~ vegcover, family = poisson)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.064  -0.191   0.160   0.430   1.190  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.63533    0.33509    4.88  1.1e-06 ***
vegcover     0.01261    0.00501    2.52    0.012 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 15.0749  on 9  degrees of freedom
Residual deviance:  8.6606  on 8  degrees of freedom
AIC: 54.67

Number of Fisher Scoring iterations: 4