Estimating population density and identifying factors that affect it: distance sampling models using the unmarked package in R

Data from ungulates in Kafue National Park, Zambia

Introduction

Measuring and understanding the causes of spatial and temporal variation in population density is one of the most basic and important aspects of ecology and conservation biology. If you were allowed to do only one thing in an ecological or conservation-related study, this would probably be it.

The unmarked package in R allows one to estimate population density using distance sampling models, which account for variation in the probability that animals are detected (with probability = p) or not detected (with probability 1-p). At the most basic level,

\[ \widehat{N} = \frac{C}{ \widehat{p} } \]

The estimate of population size \(\widehat{N}\) is equal to the number counted, divided by the estimated detection probability. There are close logical parallels with models used to estimate survival rates with marked animals (the name unmarked is a nod to the well-known program MARK.)

The data to determine C and estimate \(\widehat{p}\) come from line transect sampling, in which one moves along a transect and records the perpendicular distance from the transect at which each individual (or group, see below) was detected. In practice, one actually records the linear distance and the bearing for each sighting, and then uses trigonometry to obtain the perpendicular distance.

perpendicular distance figure

  • perpendicular distance = distance to animal * sin(angle of sighting from transect line)

This is done to avoid animals moving (usually running away) before one arrives to the point where they are perpendicular.

Distance sampling models correct density estimates for undetected animals by using the distribution of distances, with the assumptions that:

  1. Animals on the line are detected with p = 1.
  2. Animals are distributed randomly with respect to the transect location. This second assumption is often violated by locating transects for convenience in a manner that is not representative of the study area. The most common way of making this error is to conduct transects along roads, which are very rarely sited in a random manner.

distance sampling logic

If assumptions 1 and 2 are valid, then the mean detection probability p is estimated as the area under the red line fitted to this frequency distribution, relative to the area under a horizontal line at p = 1.

The unmarked package in R provides a function (distsamp) to fit slightly more sophisticated distance sampling models, which allow for estimation of the effects of covariates on both the detection probability and population density. Controlling for effects of covariates on detection is usually not of ecological interest, but is necessary for accurate estimation of density. Understanding the covariates of density is often of primary interest for conservation. Doing the estimation in R allows for comparison of models using AIC scores or some other criterion, and for graphical display of results, as shown below.

Analysis: setting up the dataframe for unmarked

Clean up the workspace, set the working directory (you’ll have to adjust the setwd() command to match the directory path on your machine; recall that path statements in R use /, not  as in windows), load the unmarked package, read the data in, inspect the data.

rm(list = ls())



setwd("C:/Users/screel/Desktop/KENYA/ungulate_transects_in_R") 

# you will need to set the working directory appropriately for your machine

# recall that there is a menu option under session in RStudio



library(unmarked)
## Loading required package: reshape

## Loading required package: lattice

## Loading required package: Rcpp
dists <- read.csv("knp.allsp.oct2012.csv", header=TRUE)

dists$trans <- as.factor(dists$trans)             #convert transects into a categorical variable



dim(dists)
## [1] 190  13
head(dists)
##        date   trans type hab    ht col burn lagoon seas.water  species

## 1 21-Oct-12  KT16.5  T    OW 1-Oct MIX    P      A          A      BUF

## 2 26-Oct-12   KT1.3    T  OB 1-Oct  BR    N      P          A BUSHBUCK

## 3 23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P BUSHBUCK

## 4 23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P BUSHBUCK

## 5 26-Oct-12   KT5.5    O  OW 1-Oct  GR    P      A          A BUSHBUCK

## 6 23-Oct-12  KT10.1    O  OW 1-Oct  BR    C      A          A   DUIKER

##   single.mix herd.size dist

## 1         S          3  154

## 2          S         2   24

## 3          S         1   48

## 4          S         2   32

## 5          S         1    8

## 6                   NA   48
tail(dists)
##          date trans type hab    ht col burn lagoon seas.water species

## 185 26-Oct-12 KT3.3    T  OW 1-Oct  BR    P      A          A        

## 186 26-Oct-12 KT5.4    O  OW 1-Oct  GR    N      A          A        

## 187 25-Oct-12 KT6.3    O  OW 1-Oct  BR    N      A          A        

## 188 24-Oct-12 KT7.1    S  OW   <10  BR    P      A          A        

## 189 24-Oct-12 KT8.2    O  OW 1-Oct GR     C      A          A        

## 190 24-Oct-12 KT8.3    O  OG    >1  BR    N      A          A        

##     single.mix herd.size dist

## 185                   NA    0

## 186                   NA    0

## 187                   NA    0

## 188                   NA    0

## 189                   NA    0

## 190                   NA    0

Subset data by species and inspect it. Here, we’ll select only the data for puku, a small obligate grazer that lives in medium-sized herds, mainly in floodplains close to permanent rivers.

puku

dists.sub <- subset(dists, species == 'PU')

dim(dists.sub)
## [1] 61 13
head(dists.sub)
##         date   trans type hab    ht col burn lagoon seas.water species

## 78 26-Oct-12   KT1.1   T   OW 1-Oct  BR    N      P          A      PU

## 79 26-Oct-12   KT1.1   T   OW 1-Oct  BR    N      P          A      PU

## 80 26-Oct-12   KT1.3    T  OB 1-Oct  BR    N      P          A      PU

## 81 26-Oct-12   KT1.3    T  OB 1-Oct  BR    N      P          A      PU

## 82 22-Oct-12  KT12.1    O  OW 1-Oct  BR    C      A          A      PU

## 83 23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

##    single.mix herd.size dist

## 78         S          2    0

## 79          S         1   10

## 80          S         6   65

## 81          S         1   69

## 82         S          6    0

## 83          S         1    0
tail(dists.sub)
##          date  trans type hab    ht col burn lagoon seas.water species

## 133 24-Oct-12  KT9.2    T  OW 1-Oct  BR    P      P          P      PU

## 134 24-Oct-12  KT9.3    T  OW 1-Oct  BR    N      P          P      PU

## 135 19-Oct-12 KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 136 19-Oct-12 KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 137 19-Oct-12 KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 138 19-Oct-12 KT17.5    T  FG 1-Oct  GR    C      P          P      PU

##     single.mix herd.size dist

## 133          M         2   83

## 134          S         2   29

## 135          S         1   59

## 136          S         9   66

## 137          M        50   71

## 138          S         2  101

Order the data and view it for truncation. Truncation can be useful to optimize the trade-off between maximizing sample size (yielding precise estimates of density) and including sparse data with low detection probability (and thus yielding imprecise estimates of density).

dists.order <- dists.sub[order(dists.sub$dist), ]        #this orders the data from small to large perp distances

dists.order
##          date   trans type hab    ht col burn lagoon seas.water species

## 78  26-Oct-12   KT1.1   T   OW 1-Oct  BR    N      P          A      PU

## 82  22-Oct-12  KT12.1    O  OW 1-Oct  BR    C      A          A      PU

## 83  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 84  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 97  22-Oct-12  KT13.1    O  OG 1-Oct GR     C      A          A      PU

## 110 19-Oct-12  KT17.4    T  FG 1-Oct  GR    C      P          P      PU

## 79  26-Oct-12   KT1.1   T   OW 1-Oct  BR    N      P          A      PU

## 109 19-Oct-12  KT17.3    T  FG 1-Oct  GR    P      P          A      PU

## 120 26-Oct-12   KT4.1    S  OW 1-Oct  BR    C      A          P      PU

## 123 26-Oct-12   KT5.2    O  CW   <10  BR    N      A          A      PU

## 124 26-Oct-12   KT5.2    O  CW   <10  BR    N      A          A      PU

## 94  23-Oct-12 KT12B.3    S  OW 1-Oct MIX    P      A          A      PU

## 85  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 112 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 134 24-Oct-12   KT9.3    T  OW 1-Oct  BR    N      P          P      PU

## 121 26-Oct-12   KT4.5    O  OW 1-Oct  BR    C      A          A      PU

## 122 26-Oct-12   KT5.1    O  OW 1-Oct  BR    N      A          A      PU

## 128 25-Oct-12   KT6.2    O  OW 1-Oct  BR    C      A          A      PU

## 132 24-Oct-12   KT9.1   T   OW 1-Oct  BR    C      P          A      PU

## 135 19-Oct-12  KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 125 25-Oct-12   KT6.1    O  OW 1-Oct MIX    P      A          P      PU

## 80  26-Oct-12   KT1.3    T  OB 1-Oct  BR    N      P          A      PU

## 92  23-Oct-12 KT12B.2    S  OG 1-Oct  GR    C      A          A      PU

## 136 19-Oct-12  KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 81  26-Oct-12   KT1.3    T  OB 1-Oct  BR    N      P          A      PU

## 115 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 137 19-Oct-12  KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 89  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 131 24-Oct-12   KT9.1   T   OW 1-Oct  BR    C      P          A      PU

## 113 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 133 24-Oct-12   KT9.2    T  OW 1-Oct  BR    P      P          P      PU

## 106 19-Oct-12  KT17.1   T   FG 1-Oct GR     P      P          P      PU

## 87  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 116 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 114 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 90  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 138 19-Oct-12  KT17.5    T  FG 1-Oct  GR    C      P          P      PU

## 118 19-Oct-12  KT18.2    S  FG 1-Oct  GR    N      P          P      PU

## 99  22-Oct-12  KT14.1    O  OB   <10 GR     N      A          A      PU

## 117 19-Oct-12  KT18.1    S  FG 1-Oct GR     N      P          P      PU

## 98  22-Oct-12  KT13.1    O  OG 1-Oct GR     C      A          A      PU

## 100 22-Oct-12  KT14.2    O  OG 1-Oct MIX    P      P          P      PU

## 95  23-Oct-12 KT12B.3    S  OW 1-Oct MIX    P      A          A      PU

## 129 25-Oct-12   KT6.2    O  OW 1-Oct  BR    C      A          A      PU

## 103 21-Oct-12 KT16.10    S  FG   <10  GR    C      P          P      PU

## 104 21-Oct-12 KT16.10    S  FG   <10  GR    C      P          P      PU

## 119 19-Oct-12  KT18.2    S  FG 1-Oct  GR    N      P          P      PU

## 130 25-Oct-12   KT6.2    O  OW 1-Oct  BR    C      A          A      PU

## 88  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 126 25-Oct-12   KT6.1    O  OW 1-Oct MIX    P      A          P      PU

## 108 19-Oct-12  KT17.1   T   FG 1-Oct GR     P      P          P      PU

## 127 25-Oct-12   KT6.1    O  OW 1-Oct MIX    P      A          P      PU

## 101 21-Oct-12  KT15.1    O  OG 1-Oct GR     P      A          A      PU

## 86  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 102 21-Oct-12 KT16.10    S  FG   <10  GR    C      P          P      PU

## 111 19-Oct-12  KT17.4    T  FG 1-Oct  GR    C      P          P      PU

## 107 19-Oct-12  KT17.1   T   FG 1-Oct GR     P      P          P      PU

## 91  23-Oct-12 KT12B.1    S  OG 1-Oct GR     C      P          P      PU

## 93  23-Oct-12 KT12B.2    S  OG 1-Oct  GR    C      A          A      PU

## 96  23-Oct-12 KT12B.3    S  OW 1-Oct MIX    P      A          A      PU

## 105 21-Oct-12 KT16.10    S  FG   <10  GR    C      P          P      PU

##     single.mix herd.size dist

## 78          S          2    0

## 82          S          6    0

## 83           S         1    0

## 84           S         4    0

## 97           S         1    0

## 110          S         1    0

## 79           S         1   10

## 109          S         1   14

## 120          M         1   15

## 123         S          1   19

## 124          S         1   21

## 94           S         3   24

## 85           S        16   27

## 112          S         5   27

## 134          S         2   29

## 121          M         1   30

## 122         S          1   44

## 128          M         1   45

## 132          S         1   47

## 135          S         1   59

## 125          S         1   63

## 80           S         6   65

## 92           M         7   66

## 136          S         9   66

## 81           S         1   69

## 115          S        14   69

## 137          M        50   71

## 89           S         3   76

## 131          S         1   80

## 113          M         4   82

## 133          M         2   83

## 106          S        19   84

## 87           S        11   85

## 116          S         2   86

## 114          M         3   90

## 90           S         1   92

## 138          S         2  101

## 118          S         5  106

## 99          S         14  117

## 117          S        11  119

## 98           S        13  121

## 100          S        27  125

## 95           S         1  127

## 129          S         2  163

## 103          S         1  175

## 104          S         7  182

## 119          M         1  198

## 130          S         3  204

## 88           M         7  227

## 126          M         1  233

## 108          S        16  244

## 127          M         6  249

## 101         S          7  263

## 86           M        17  275

## 102          M         7  301

## 111          M         2  369

## 107          S        21  412

## 91          S          1  476

## 93           S         9  498

## 96           S         8  555

## 105          S         6  614
dim(dists.order)
## [1] 61 13

300m looks reasonable as a cutoff. Truncate the data:

dists.trunc <-  dists.order[which(dists.order$dist < 300), ]   

dim(dists.trunc)                                                
## [1] 54 13
#dists.trunc  # commented out data inspection to shorten the output

Aggregate detections into 50 m or 100 m intervals

yDat <- formatDistData(dists.trunc, distCol="dist", transectNameCol="trans", dist.breaks=c(0, 50, 100, 150, 200, 250, 300))  #create distance categories (m). Could change these.

yDat
##         [0,50] (50,100] (100,150] (150,200] (200,250] (250,300]

## KT1.1        2        0         0         0         0         0

## KT1.2        0        0         0         0         0         0

## KT1.3        0        2         0         0         0         0

## KT1.4        0        0         0         0         0         0

## KT1.5        0        0         0         0         0         0

## KT10.1       0        0         0         0         0         0

## KT10.2       0        0         0         0         0         0

## KT10.3       0        0         0         0         0         0

## KT11.1       0        0         0         0         0         0

## KT11.2       0        0         0         0         0         0

## KT11.3       0        0         0         0         0         0

## KT12.1       1        0         0         0         0         0

## KT12.2       0        0         0         0         0         0

## KT12B.1      3        3         0         0         1         1

## KT12B.2      0        1         0         0         0         0

## KT12B.3      1        0         1         0         0         0

## KT12B.4      0        0         0         0         0         0

## KT13.1       1        0         1         0         0         0

## KT13.2       0        0         0         0         0         0

## KT13.3       0        0         0         0         0         0

## KT14.1       0        0         1         0         0         0

## KT14.2       0        0         1         0         0         0

## KT14.3       0        0         0         0         0         0

## KT14.4       0        0         0         0         0         0

## KT15.1       0        0         0         0         0         1

## KT15.2       0        0         0         0         0         0

## KT15.3       0        0         0         0         0         0

## KT15.4       0        0         0         0         0         0

## KT15.5       0        0         0         0         0         0

## KT16.1       0        0         0         0         0         0

## KT16.10      0        0         0         2         0         0

## KT16.2       0        0         0         0         0         0

## KT16.3       0        0         0         0         0         0

## KT16.4       0        0         0         0         0         0

## KT16.5       0        0         0         0         0         0

## KT16.6       0        0         0         0         0         0

## KT16.7       0        0         0         0         0         0

## KT16.8       0        0         0         0         0         0

## KT16.9       0        0         0         0         0         0

## KT17.1       0        1         0         0         1         0

## KT17.2       0        0         0         0         0         0

## KT17.3       1        0         0         0         0         0

## KT17.4       1        0         0         0         0         0

## KT17.5       0        3         1         0         0         0

## KT18.1       1        4         1         0         0         0

## KT18.2       0        0         1         1         0         0

## KT18.3       0        0         0         0         0         0

## KT3.1        0        0         0         0         0         0

## KT3.2        0        0         0         0         0         0

## KT3.3        0        0         0         0         0         0

## KT3.4        0        0         0         0         0         0

## KT4.1        1        0         0         0         0         0

## KT4.2        0        0         0         0         0         0

## KT4.3        0        0         0         0         0         0

## KT4.4        0        0         0         0         0         0

## KT4.5        1        0         0         0         0         0

## KT5.1        1        0         0         0         0         0

## KT5.2        2        0         0         0         0         0

## KT5.3        0        0         0         0         0         0

## KT5.4        0        0         0         0         0         0

## KT5.5        0        0         0         0         0         0

## KT6.1        0        1         0         0         2         0

## KT6.2        1        0         0         1         1         0

## KT6.3        0        0         0         0         0         0

## KT7.1        0        0         0         0         0         0

## KT7.2        0        0         0         0         0         0

## KT7.3        0        0         0         0         0         0

## KT8.1        0        0         0         0         0         0

## KT8.2        0        0         0         0         0         0

## KT8.3        0        0         0         0         0         0

## KT8.4        0        0         0         0         0         0

## KT8.5        0        0         0         0         0         0

## KT8.6        0        0         0         0         0         0

## KT9.1        1        1         0         0         0         0

## KT9.2        0        1         0         0         0         0

## KT9.3        1        0         0         0         0         0
nrow(yDat)          #Should be  76 rows for this subset of KNP data, equal to ~120 km of transects sampled on one occasion.
## [1] 76

Read in and inspect the covariates that will be used to model effects on p and density.

covs <- read.csv("C:/Users/screel/Desktop/KENYA/ungulate_transects_in_R/knp_covs_oct2012.csv", header=TRUE)

head(covs)
##        date  trans type length hab   ht col burn lagoon seas.water

## 1 26-Oct-12  KT1.1    T   2000  OW 10-1  BR    N      P          A

## 2 26-Oct-12  KT1.2    T    300  OW 10-1  BR    N      P          A

## 3 26-Oct-12  KT1.3    T   2000  OB 10-1  BR    N      P          A

## 4 26-Oct-12  KT1.4    T   2000  CW 10-1  BR    N      A          A

## 5 26-Oct-12  KT1.5    T   1700  CW 10-1  BR    N      P          A

## 6 23-Oct-12 KT10.1    O   2000  OW 10-1  BR    C      A          A
tail(covs)
##         date trans type length hab   ht col burn lagoon seas.water

## 71 25-Oct-12 KT6.1    O   2000  OW 10-1 MIX    P      A          P

## 72 24-Oct-12 KT9.2    T   1200  OW 10-1  BR    P      P          P

## 73 24-Oct-12 KT9.3    T    900  OW 10-1  BR    N      P          P

## 74 26-Oct-12 KT3.2    T   2000  OW 10-1  BR    P      A          A

## 75 26-Oct-12 KT3.3    T   2000  OW 10-1  BR    P      A          A

## 76 25-Oct-12 KT6.3    O   1400  OW 10-1  BR    N      A          A
nrow(covs)
## [1] 76

Create the unmarked dataframe. unmarked requires input in a dataframe with a specific format, and provides a convenience function to create it. You can use context sensitive help (F1 in Rstudio) to better understand the arguments of this function and their default values.

umf <- unmarkedFrameDS(y=as.matrix(yDat), siteCovs=covs, survey="line", dist.breaks=c(0, 50, 100, 150, 200, 250, 300), tlength=covs$length,  unitsIn="m")



head(umf)
## Data frame representation of unmarkedFrame object.

##        y.1 y.2 y.3 y.4 y.5 y.6      date  trans type length hab   ht col

## KT1.1    2   0   0   0   0   0 26-Oct-12  KT1.1    T   2000  OW 10-1  BR

## KT1.2    0   0   0   0   0   0 26-Oct-12  KT1.2    T    300  OW 10-1  BR

## KT1.3    0   2   0   0   0   0 26-Oct-12  KT1.3    T   2000  OB 10-1  BR

## KT1.4    0   0   0   0   0   0 26-Oct-12  KT1.4    T   2000  CW 10-1  BR

## KT1.5    0   0   0   0   0   0 26-Oct-12  KT1.5    T   1700  CW 10-1  BR

## KT10.1   0   0   0   0   0   0 23-Oct-12 KT10.1    O   2000  OW 10-1  BR

## KT10.2   0   0   0   0   0   0 23-Oct-12 KT10.2    O   2000  CW 10-1  BR

## KT10.3   0   0   0   0   0   0 23-Oct-12 KT10.3    O   1800  CW 10-1  BR

## KT11.1   0   0   0   0   0   0 23-Oct-12 KT11.1    O   2000  CW 10-1  GR

## KT11.2   0   0   0   0   0   0 23-Oct-12 KT11.2    O   2000  CW 10-1  GR

##        burn lagoon seas.water

## KT1.1     N      P          A

## KT1.2     N      P          A

## KT1.3     N      P          A

## KT1.4     N      A          A

## KT1.5     N      P          A

## KT10.1    C      A          A

## KT10.2    C      A          A

## KT10.3    C      A          A

## KT11.1    C      A          A

## KT11.2    P      A          A

View the frequency distribution for perpendicular distances as a final check

hist(umf, xlab="distance (m)", main="", cex.lab=0.8, cex.axis=0.8)

The fact that the frequency of detections falls way off past 100m strongly suggest that we would underestimate density if we did not use distance sampling to correct for the probability of detection being less than one. This sort of pattern is nearly universal – inferences based on ‘naive’ density estimates that fail to address detection should be treated with caution. These are surprisingly common… for example, elk counts in Montana and YNP do not account for detection.

Analysis: fitting competing distance models

distsamp(~1, ~1) fits a model with just one mean value (an “intercept only” model) for each parameter.
Recall that you can use context-sensitive help (F1) to understand the arguments of the distsamp function (or any function.) The first parameter fit is p, so the first ~1 means ‘fit an intercept-only model for p – just determine the overall mean detection probability’. The second parameter fit is density, N, so the second ~1 means ‘fit an intercept only model for N – just determine one overall estimate of density’. (If you’re reading the code carefully you’ll have noticed that unmarked is estimating population density (N/area) in the units of area we specified in the unmarkedFrameDS function, rather than population size, N.)

Unmarked can fita variety of distributions to the histogram of distances: here we consider halfnormal, hazard and uniform distribution, running the simplest models first to check if the data are adequate for simple modelling with distance sampling methods.

m.half <- distsamp(~1 ~1, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.haz <- distsamp(~1 ~1, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.uni <- distsamp(~1 ~1, umf, keyfun="uniform", output="density", unitsOut="kmsq")



m.half
## 

## Call:

## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "halfnorm", output = "density", 

##     unitsOut = "kmsq")

## 

## Density:

##  Estimate    SE    z P(>|z|)

##     0.396 0.175 2.26  0.0235

## 

## Detection:

##  Estimate    SE    z P(>|z|)

##      4.81 0.122 39.4       0

## 

## AIC: 375.2169
m.haz
## 

## Call:

## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "hazard", output = "density", 

##     unitsOut = "kmsq")

## 

## Density:

##  Estimate    SE    z P(>|z|)

##     0.507 0.223 2.28  0.0228

## 

## Detection:

##  Estimate    SE    z  P(>|z|)

##       4.5 0.299 15.1 2.97e-51

## 

## Hazard-rate(scale):

##  Estimate    SE    z P(>|z|)

##     0.629 0.327 1.92  0.0547

## 

## AIC: 374.9477
m.uni
## 

## Call:

## distsamp(formula = ~1 ~ 1, data = umf, keyfun = "uniform", output = "density", 

##     unitsOut = "kmsq")

## 

## Density:

##  Estimate    SE     z P(>|z|)

##    -0.284 0.136 -2.09  0.0367

## 

## AIC: 399.4951

All three models can be fit successfully. AIC scores indicate that the uniform distribution is substantially worse than the other two, which makes sense given the shape of the distibution (plotted above).

Proceed to fitting an a priori model set. Note that in this set, I am not considering covariates for p, though the estimates of density do take the mean value of p into account, while testing for variables that affect density.

#half-norm:

m.half.1.type <- distsamp(~1 ~type, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.hab <- distsamp(~1 ~hab, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.ht <- distsamp(~1 ~ht, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.col <- distsamp(~1 ~col, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.burn <- distsamp(~1 ~burn, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.lagoon <- distsamp(~1 ~lagoon, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")

m.half.1.seaswater <- distsamp(~1 ~seas.water, umf, keyfun="halfnorm", output="density", unitsOut="kmsq")







#hazard:

m.haz.1.type <- distsamp(~1 ~type, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.hab <- distsamp(~1 ~hab, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.ht <- distsamp(~1 ~ht, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.col <- distsamp(~1 ~col, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.burn <- distsamp(~1 ~burn, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.lagoon <- distsamp(~1 ~lagoon, umf, keyfun="hazard", output="density", unitsOut="kmsq")

m.haz.1.seaswater <- distsamp(~1 ~seas.water, umf, keyfun="hazard", output="density", unitsOut="kmsq")

Fit list and model selection. This code organizes all of the output from the models and compares them via AIC scores.

fmList <- fitList(m.half=m.half, m.uni=m.uni, 

                  m.half.1.type=m.half.1.type, m.half.1.hab=m.half.1.hab, m.half.1.ht=m.half.1.ht,

                  m.half.1.col=m.half.1.col, m.half.1.burn=m.half.1.burn, m.half.1.lagoon=m.half.1.lagoon,

                  m.half.1.seaswater=m.half.1.seaswater, m.haz.1.type=m.haz.1.type, m.haz.1.hab=m.haz.1.hab,

                  m.haz.1.ht=m.haz.1.ht, m.haz.1.burn=m.haz.1.burn,

                  m.haz.1.lagoon=m.haz.1.lagoon, m.haz.1.seaswater=m.haz.1.seaswater, m.haz.1.col=m.haz.1.col, m.haz=m.haz) 

modSel(fmList)
##                    nPars    AIC delta   AICwt cumltvWt

## m.haz.1.hab            7 368.50  0.00 2.7e-01     0.27

## m.half.1.hab           6 368.76  0.27 2.4e-01     0.51

## m.haz.1.ht             5 369.64  1.14 1.5e-01     0.66

## m.half.1.ht            4 369.91  1.41 1.3e-01     0.80

## m.haz.1.burn           5 372.56  4.07 3.6e-02     0.83

## m.haz.1.type           5 372.76  4.27 3.2e-02     0.87

## m.half.1.burn          4 372.83  4.34 3.1e-02     0.90

## m.half.1.type          4 373.03  4.53 2.8e-02     0.93

## m.haz.1.col            5 374.63  6.13 1.3e-02     0.94

## m.half.1.col           4 374.90  6.40 1.1e-02     0.95

## m.haz                  3 374.95  6.45 1.1e-02     0.96

## m.haz.1.seaswater      4 375.21  6.71 9.5e-03     0.97

## m.half                 2 375.22  6.72 9.5e-03     0.98

## m.half.1.seaswater     3 375.48  6.98 8.3e-03     0.99

## m.haz.1.lagoon         4 375.92  7.43 6.6e-03     0.99

## m.half.1.lagoon        3 376.19  7.70 5.8e-03     1.00

## m.uni                  1 399.50 31.00 5.1e-08     1.00

Goodness of fit test: good fit if results of these tests show p > 0.05. First, fit a model: use your best-supported model (Lowest AIC). Then check GOF for that model.

(fm <- distsamp(~1 ~hab, keyfun="halfnorm", umf)) #starts=c(1, 1, 1, 1, 1, 1, 1)))
## 

## Call:

## distsamp(formula = ~1 ~ hab, data = umf, keyfun = "halfnorm")

## 

## Density:

##             Estimate    SE       z  P(>|z|)

## (Intercept)   -4.839 0.394 -12.293 9.85e-35

## habFG         -0.255 0.690  -0.370 7.11e-01

## habOB          0.614 0.802   0.766 4.44e-01

## habOG          1.530 0.476   3.217 1.29e-03

## habOW          0.747 0.420   1.779 7.52e-02

## 

## Detection:

##  Estimate    SE    z P(>|z|)

##      4.81 0.122 39.4       0

## 

## AIC: 368.7646
# Function returning three fit-statistics.

fitstats <- function(fm) {

  observed <- getY(fm@data)

  expected <- fitted(fm)

  resids <- residuals(fm)

  sse <- sum(resids^2)

  chisq <- sum((observed - expected)^2 / expected)

  freeTuke <- sum((sqrt(observed) - sqrt(expected))^2)

  out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke)

  return(out)

}



(pb <- parboot(fm, fitstats, nsim=25, report=1))
## t0 = 88.13049 892.762 73.80425 

## 61.7,492.9,71.8

## 40.6,406.3,51.1

## 64.1,460,62

## 43.8,504.8,56.8

## 48.6,476.4,58.3

## 53.2,383.3,60.3

## 45.4,300,55.7

## 55.6,380,63.7

## 61.1,474.7,70.6

## 48.8,547.1,59.9

## 54.2,409.1,63.5

## 74.2,363.6,77

## 64.6,400.8,63.9

## 57.3,334.4,65.2

## 36.5,582.4,46.6

## 52.7,456.6,63.3

## 39.9,664.5,53.5

## 64.3,381.4,66.3

## 61.1,407.5,68.4

## 59,425.5,63.9

## 50.1,429.4,58.2

## 49.4,521.7,61.8

## 72.8,416.8,79.6

## 56.6,366.8,65

## 45.8,491.2,54.3
## 

## Call: parboot(object = fm, statistic = fitstats, nsim = 25, report = 1)

## 

## Parametric Bootstrap Statistics:

##                 t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)

## SSE           88.1           33.7             9.80       0.0000

## Chisq        892.8          449.7            81.44       0.0000

## freemanTukey  73.8           11.4             7.58       0.0769

## 

## t_B quantiles:

##               0% 2.5% 25% 50% 75% 97.5% 100%

## SSE           37   39  49  54  61    73   74

## Chisq        300  321 383 426 491   615  665

## freemanTukey  47   49  58  63  65    78   80

## 

## t0 = Original statistic compuated from data

## t_B = Vector of bootstrap samples

There is evidence of some GOF problems for these models.

Analysis - determining predicted values for each covariate level.

Below, we are making predictions from univariate models. Alternatively we could use multivariate models to show predictions with other covariates held at their means.

First, we obtain predictions for density of puku in each habitat, then predictions for presence vs. absence of water, burned vs unburned areas, green/brown/mixed grass cover, and local presence vs absence of dambos (seasonal lagoons)

#Predict with habitat

m.hab <- data.frame(hab=factor(c('CW', 'FG', 'OB', 'OG', 'OW')))    

hab.pred <- predict(m.half.1.hab, type="state", newdata=m.hab, appendData=TRUE)

hab.pred  
##   Predicted        SE     lower    upper hab

## 1 0.7917147 0.3116250 0.3660405 1.712412  CW

## 2 0.6133793 0.3604603 0.1938695 1.940657  FG

## 3 1.4623150 1.0465805 0.3596127 5.946300  OB

## 4 3.6569708 1.1295464 1.9962122 6.699406  OG

## 5 1.6706024 0.3559837 1.1002543 2.536606  OW
#Predict with seasonal water      

m.seaswater <- data.frame(seas.water=factor(c('Y', 'N')))  

seaswater.predict<- predict(m.haz.1.seaswater, type="state", newdata=m.seaswater, appendData=TRUE)

seaswater.predict 
##   Predicted        SE     lower    upper seas.water

## 1  1.049433 0.4668786 0.4387999 2.509821          Y

## 2  1.789522 0.4077539 1.1449466 2.796975          N
#Predict with burn status

m.burn <- data.frame(burn=factor(c('Y', 'N', 'P')))    

burn.pred <- predict(m.half.1.burn, type="state", newdata=m.burn, appendData=TRUE)

burn.pred 
##   Predicted        SE     lower    upper burn

## 1 0.8935453 0.2759991 0.4877493 1.636954    Y

## 2 1.6058702 0.3851266 1.0036255 2.569503    N

## 3 2.1725212 0.5412600 1.3332034 3.540231    P
#predict with grass color:

m.col <- data.frame(col=factor(c('BR', 'GR', 'MIX')))     

col.pred <- predict(m.half.1.col, type="state", newdata=m.col, appendData=TRUE)

col.pred 
##   Predicted        SE     lower    upper col

## 1 1.8661789 0.3841525 1.2466187 2.793656  BR

## 2 1.2906892 0.3524577 0.7557497 2.204273  GR

## 3 0.7988435 0.3678780 0.3239456 1.969932 MIX
#Predict with lagoon             

m.lagoon <- data.frame(lagoon=factor(c('Y', 'N')))         #SLNP

lagoon.predict<- predict(m.half.1.lagoon, type="state", newdata=m.lagoon, appendData=TRUE)

lagoon.predict
##   Predicted        SE     lower    upper lagoon

## 1  1.135166 0.3799783 0.5890245 2.187687      Y

## 2  1.598201 0.2981475 1.1087613 2.303693      N

Plotting results

You’ll typically want to show results graphically. Here is an example with mean herd densities for each habitat, with standard errors. The code above used the predict function to obtain the predicted values for each habitat from the distsamp model ‘m.half.1.hab’ (with an intercept only for p, habitat effects on N, and a half-normal distibution fit to the frequency distribution for perpendicual distances). The output was a assigned to dataframe (hab.pred) with variables named Predicted (the estimated density), SE (the standard error of this mean), lower (lower 95% confidence limit), upper (upper 95% CL), and hab (habitat type).

hab.pred
Predicted SE lower upper hab
1 0.7917147 0.3116250 0.3660405 1.712412 CW
2 0.6133793 0.3604603 0.1938695 1.940657 FG
3 1.4623150 1.0465805 0.3596127 5.946300 OB
4 3.6569708 1.1295464 1.9962122 6.699406 OG
5 1.6706024 0.3559837 1.1002543 2.536606 OW

This code uses the package ggplot2 to produce pretty graphics. A similar plot could be made more simply with base R plotting functions (barplot, for example).

library(ggplot2)



ggplot(hab.pred, aes(x=hab, y=Predicted)) +  

    geom_bar(position=position_dodge(), stat="identity", colour = 'blue', fill = 'light blue') +

    geom_errorbar(aes(ymin=Predicted-SE, ymax=Predicted+SE), width=.2, colour = 'blue') +

    scale_x_discrete(name = expression(bold('Vegetation Type'))) +

    scale_y_continuous(name = expression(bold(paste("Herds /  ", km^bold("2"))))) +

    theme_bw()

This figure is a nice, simple way to show the effect of vegetation type on puku density, with a fairly strong tendency for density to be higher in open habitats, as expected for an obligate grazer. (Aside: puku density is rather low in flooded grassland because Kafue also holds a large population of red lechwe, a species that specializes on flooded grasslands and competitively excludes puku from those areas.)

Tests for detection bias due to herd size …

checking whether herd size affects the likelihood of detection, by examining the relationship between herd size and perpendicular distance. A lack of relationship here (no significant slope) indicates that herd size does not detectably affect detection.

#Regress cluster size on distance

plot(dists.trunc$dist, dists.trunc$herd.size)

cluster.det <- lm(dists.trunc$herd.size ~ dists.trunc$dist)

anova(cluster.det)
## Analysis of Variance Table

## 

## Response: dists.trunc$herd.size

##                  Df Sum Sq Mean Sq F value Pr(>F)

## dists.trunc$dist  1  142.6 142.601   2.042  0.159

## Residuals        52 3631.3  69.833
summary(cluster.det)    # if p<0.15, then use estimate of herd size at distance 0, include SE to propagate uncertainty
## 

## Call:

## lm(formula = dists.trunc$herd.size ~ dists.trunc$dist)

## 

## Residuals:

##    Min     1Q Median     3Q    Max 

## -8.099 -4.065 -3.070  1.235 44.373 

## 

## Coefficients:

##                  Estimate Std. Error t value Pr(>|t|)  

## (Intercept)       4.10570    1.76631   2.324    0.024 *

## dists.trunc$dist  0.02143    0.01500   1.429    0.159  

## ---

## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 

## Residual standard error: 8.357 on 52 degrees of freedom

## Multiple R-squared:  0.03779,    Adjusted R-squared:  0.01928 

## F-statistic: 2.042 on 1 and 52 DF,  p-value: 0.159
#Regress log(cluster size) on distance

plot(dists.trunc$dist, log(dists.trunc$herd.size))

l.herd <- log(dists.trunc$herd.size)

cluster.det <- lm(l.herd ~ dists.trunc$dist)

summary(cluster.det) 
## 

## Call:

## lm(formula = l.herd ~ dists.trunc$dist)

## 

## Residuals:

##     Min      1Q  Median      3Q     Max 

## -1.8110 -0.7995 -0.1211  0.8118  2.8721 

## 

## Coefficients:

##                  Estimate Std. Error t value Pr(>|t|)   

## (Intercept)      0.701921   0.227956   3.079  0.00331 **

## dists.trunc$dist 0.004760   0.001935   2.460  0.01727 * 

## ---

## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 

## Residual standard error: 1.078 on 52 degrees of freedom

## Multiple R-squared:  0.1042, Adjusted R-squared:  0.08699 

## F-statistic: 6.049 on 1 and 52 DF,  p-value: 0.01727

Using this information to convert herd density to individual density

The models are fit using detections of each ‘cluster’ of one or more individuals; here, puku herds. We usually want to convert the estimate of herd density into an estimate of individual density. If the regression of herd size on distance is strong (p<0.15 is a good rule of thumb), then we can use the estimated herd size at distance=0 to convert the density of herds to the density of individuals.

First we need to backtransform the estimated intercept (herd size at distance = 0) and SE from the log-scale.

bt.est <- exp(0.701921)                 #exp of the intercept

bt.est
## [1] 2.017625
bt.se <-  0.227956/(1/bt.est)           #SE of intercept divided by 1 over the back-transformed estimate

bt.se
## [1] 0.4599297

If the regression of herd size on density is not strong (p>0.15 is a rule of thumb) and there are no important covariates of density in top model, then the overall mean herd size is appropriate to convert herd density to individual density:

avg.herd <- mean(dists.trunc$herd.size)

avg.herd
## [1] 6.037037
sd.herd <- sd(dists.trunc$herd.size)

se.herd <- sd.herd/(sqrt(nrow(dists.trunc)))  

dim(dists.trunc)
## [1] 54 13

If the covariates in the best density model also affect herd size, we must determine mean herd size for each level of the relevant covariates:

library(plyr)
## 

## Attaching package: 'plyr'

## 

## The following objects are masked from 'package:reshape':

## 

##     rename, round_any
#Mean and SE of herd size by habitats:

herdsize.hab <- ddply(dists.trunc,~hab,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))

herdsize.hab
##   hab     mean        se

## 1  CW 1.000000 0.0000000

## 2  FG 8.444444 2.7820882

## 3  OB 7.000000 3.7859389

## 4  OG 8.846154 2.1508679

## 5  OW 2.000000 0.3791976
#lagoons

ddply(dists.trunc,~seas.water,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))
##   seas.water     mean        se

## 1          A 3.409091 0.8287556

## 2          P 7.843750 1.7961144
#burn status

ddply(dists.trunc,~burn,summarise,mean=mean(herd.size),se=(sd(herd.size))/(sqrt(length(herd.size))))
##   burn     mean       se

## 1    C 6.461538 1.987059

## 2    N 4.352941 1.084453

## 3    P 7.636364 2.707809

So we convert from density of herds to density of individuals by multiplication:

  1. Take the density estimates for each covariate level from the predict function above (for example, the density of herds in each habitat)

  2. Multiply these by the mean herd size from the ddply function to obtain individual density estimates, corrected for the influence of variables found to affect density, herd size, or both.

hab.ind.density <- hab.pred$Predicted*herdsize.hab$mean

hab.ind.density
## [1]  0.7917147  5.1796478 10.2362051 32.3501260  3.3412047

The values in the vector hab.ind.density are the densities of puku individuals in the five major habitats (or more correctly, vegetation types)

CW (closed woodland) FG (flooded grassland) OB (open bushland) OG (open grassland) OW (open woodland)

Habitat-specific herd densities from hab.pred$Predicted were:

Predicted SE lower upper hab
1 0.7917147 0.3116250 0.3660405 1.712412 CW
2 0.6133793 0.3604603 0.1938695 1.940657 FG
3 1.4623150 1.0465805 0.3596127 5.946300 OB
4 3.6569708 1.1295464 1.9962122 6.699406 OG
5 1.6706024 0.3559837 1.1002543 2.536606 OW

And habitat-specific mean herd sizes from herdsize.hab$mean were:

hab mean se
1 CW 1.000000 0.0000000
2 FG 8.444444 2.7820882
3 OB 7.000000 3.7859389
4 OG 8.846154 2.1508679
5 OW 2.000000 0.3791976

Make a new figure of individual density by habitat type, to compare to the pattern for herd density. First, use the cbind function to make an appropriate dataframe.

hab.ind.pred  <- cbind(hab.pred, hab.ind.density)  

hab.ind.pred
##   Predicted        SE     lower    upper hab hab.ind.density

## 1 0.7917147 0.3116250 0.3660405 1.712412  CW       0.7917147

## 2 0.6133793 0.3604603 0.1938695 1.940657  FG       5.1796478

## 3 1.4623150 1.0465805 0.3596127 5.946300  OB      10.2362051

## 4 3.6569708 1.1295464 1.9962122 6.699406  OG      32.3501260

## 5 1.6706024 0.3559837 1.1002543 2.536606  OW       3.3412047

Then make minor modifications to the code for the previous plot using the ggplot function.

ggplot(hab.ind.pred, aes(x=hab, y=hab.ind.density)) +  

    geom_bar(position=position_dodge(), stat="identity", colour = 'dark red', fill = 'red') +

#    geom_errorbar(aes(ymin=Predicted-SE, ymax=Predicted+SE), width=.2, colour = 'blue') +

    scale_x_discrete(name = expression(bold('Vegetation Type'))) +

    scale_y_continuous(name = expression(bold(paste("Puku /  ", km^bold("2"))))) +

    theme_bw()

A much stronger preference for open grassland is revealed. Note that I ‘commented out’ the error bar code because we haven’t calculated variances for individual density yet… that code is just below. (When executing a script, R ignores comments, which are identified by a #. By putting a # at the start of a line, or commenting it out, you can disable code without deleting it, if you want to keep it for later use.)

Obtaining confidence intervals for individual density estimates using the delta method

When we calculate a parameter (like individual density) that is itself a function of two or more parameters that we’ve already estimated (like herd density and herd size), we can use the delta method to combine the variances of the original parameters to yield the variance of the final parameter. Briefly, the delta method is a way of calculating the variance of a function of one or more variables. For linear functions, there is an exact solution to the problem of determining the variance of a function, using the variances of the variables in the function. For non-linear functions, there is no exact solution, but the delta method usually provides a reasonable approximation. (The delta method relies on Taylor series approximation, which may be familiar to you from calculus but is beyond the scope of this exercise.)

A delta method example, calculating a variance for individual puku density in OW (open woodland) vegetation:

library(msm) #there is a deltamethod function in the msm package; equivalent functions are available in other packages.





N1 = 1.67  #herd density of puku in OW from hab.pred   

N2 = 2.00  #mean herd size of puku in OW from herdsize.hab



s.1 = (0.36* sqrt(76))^2  # converting  SE to variance for herd density

s.2 = (0.38* sqrt(76))^2  # and then for herd size

sigma = matrix(c(s.1,0,0,s.2),2,2)  # putting those variances into a 2X2 variance-covariance matrix with covariances = 0

ind.density.se=deltamethod(~x2*x1,c(N1,N2),sigma, ses = TRUE) # applying the delta method with the deltamethod function in the msm package

ind.density.se
## [1] 8.366893

Quite a large standard error in this habitat-specific estimate of individual density when the variance in both estimated density and estimate herd size for that habitat are propagated forward. However, that is not too surprising when you consider how little data I’ve provided for this estimate of density in just one habitat type, with only one set of surveys, with just one species under consideration… a clear example of the fact that partitioning a small data set often causes uncertainties to become large.

Obtain a confidence interval

lcl.density = max(0,(N1*N2 - 1.96 * ind.density.se))  # lower confidence limit is bounded at zero

ucl.density = N1*N2 + 1.96 * ind.density.se

CI.density = c(lcl.density, ucl.density)

CI.density
## [1]  0.00000 19.73911