Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Distance software: design and analysis of distance sampling surveys for estimating population size

Distance software: design and analysis of distance sampling surveys for estimating population size Journal of Applied Ecology 2010, 47, 5–14 doi: 10.1111/j.1365-2664.2009.01737.x REVIEW Distance software: design and analysis of distance sampling surveys for estimating population size ,1 2 1 3 Len Thomas* , Stephen T. Buckland , Eric A. Rexstad , Jeff L. Laake , 4 2 1 1 Samantha Strindberg , Sharon L. Hedley , Jon R.B. Bishop , Tiago A. Marques and Kenneth P. Burnham Research Unit for Wildlife Population Assessment, Centre for Research into Ecological and Environmental Modelling, University of St. Andrews, St. Andrews KY16 9LZ, UK; Centre for Research into Ecological and Environmental Modelling, University of St. Andrews, St. Andrews KY16 9LZ, UK; National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service, 7600 Sand Point Way NE F ⁄ AKC3, Seattle, WA 4 5 98115 6349, USA; Wildlife Conservation Society, 2300 Southern Boulevard, Bronx, NY 10460, USA; and Colorado Cooperative Fish and Wildlife Research Unit, Department of Fish, Wildlife and Conservation Biology, Colorado State University, Fort Collins, CO 80523, USA Summary 1. Distance sampling is a widely used technique for estimating the size or density of biological populations. Many distance sampling designs and most analyses use the software Distance. 2. We briefly review distance sampling and its assumptions, outline the history, structure and capabilities of Distance, and provide hints on its use. 3. Good survey design is a crucial prerequisite for obtaining reliable results. Distance has a survey design engine, with a built-in geographic information system, that allows properties of different pro- posed designs to be examined via simulation, and survey plans to be generated. 4. A first step in analysis of distance sampling data is modelling the probability of detection. Distance contains three increasingly sophisticated analysis engines for this: conventional distance sampling, which models detection probability as a function of distance from the transect and assumes all objects at zero distance are detected; multiple-covariate distance sampling, which allows covariates in addition to distance; and mark–recapture distance sampling, which relaxes the assumption of certain detection at zero distance. 5. All three engines allow estimation of density or abundance, stratified if required, with associated measures of precision calculated either analytically or via the bootstrap. 6. Advanced analysis topics covered include the use of multipliers to allow analysis of indirect surveys (such as dung or nest surveys), the density surface modelling analysis engine for spatial and habitat modelling, and information about accessing the analysis engines directly from other software. 7. Synthesis and applications. Distance sampling is a key method for producing abundance and density estimates in challenging field conditions. The theory underlying the methods continues to expand to cope with realistic estimation situations. In step with theoretical developments, state- of-the-art software that implements these methods is described that makes the methods accessible to practising ecologists. Key-words: distance sampling, line transect sampling, point transect sampling, population abundance, population density, sighting surveys, survey design, wildlife surveys Introduction *Correspondence author. E-mail: len@mcs.st-and.ac.uk Distance sampling comprises a set of methods in which Re-use of this article is permitted in accordance with the Terms distances from a line or point to detections are recorded, and Conditions set out at http://www3.interscience.wiley.com/ authorresources/onlineopen.html from which the density and ⁄ or abundance of objects is 2009 The Authors. Journal compilation  2009 British Ecological Society 6 L. Thomas et al. estimated. Objects are usually animals or animal groups ASSUMPTIONS (termed clusters), but may be plantsorinanimate objects. Detections are usually of animalsorclusters, butmay be We briefly summarize the key assumptions of the basic method (for a more detailed discussion, see Buckland et al. 2001:29– of cues (such as whale blows or bird songbursts) or sign 37). Many of the recent advances of distance sampling allow (such as dung or nests). Conventional distance sampling one or more of these assumptions to be relaxed. There are just (CDS) methods are described by Buckland et al. (2001), three key assumptions. and various extensions are considered in Buckland et al. (2004). An extensive distance sampling reference list, covering 1. Objects on the line or point are detected with cer- both methodological developments and practical application tainty. Most surveys are conducted with a single observer, or of the methods, is available at http://www.ruwpa.st-and. a single observation ‘platform’ consisting of multiple observers ac.uk/distancesamplingreferences/. Most distance sampling surveys are analysed, and many are but with data pooled across them. In cases where it is impor- designed, using the software Distance (http://www.ruwpa. tant to relax assumption 1, double-observer or double-plat- st-and.ac.uk/distance/). In this paper, we describe version 6 of form surveys may be conducted (Laake & Borchers 2004). In the software and its capabilities, and give guidance on how to these, observers either search independently of each other or use it to design and analyse surveys. there may be ‘one-way’ independence, with one observer being unaware of detections made by the other, but not vice versa. Such methods are quite often used for marine mammal sur- What is distance sampling? veys. The mark–recapture distance sampling (MRDS) engine of Distance can be used to analyse such double-observer data. TYPES OF DISTANCE SAMPLING The most widely used form of distance sampling is line transect 2. Objects do not move. Conceptually, distance sampling is a sampling. A survey region is sampled by placing a number of ‘snapshot’ method: we would like to freeze animals in position lines at random in the region or, more commonly, a series of while we conduct the survey. In practice, non-responsive systematically spaced parallel lines with a random start point. movement in line transect surveys is not problematic provided it is slow relative to the speed of the observer. Non-responsive An observer travels along each line, recording any animals movement is more problematic for point transect surveys, detected within a distance w of the line. In the standard leading to overestimation of density (Buckland et al. method,weassumeall animals on the line are detected, but 2001:173). Responsive movement before detection is problem- detection probability decreases with increasing distance from atic because animals are assumed to be located independently the line. Hence, not all animals in the strip of half-width w need of the position of the line or point (see below); implications are to be detected. In addition, the distance of each detected ani- addressed by Fewster et al. (2008). mal from the line is recorded. We use the distribution of these distances to estimate the proportion of animals in the strip that is detected, which allows us to estimate animal density and 3. Measurements are exact. Untrained observers tend to be abundance. If animals occur in well-defined clusters (e.g. flocks poor at estimating distances by eye or ear (Alldredge, Simons or herds), then detections refer to clusters rather than to indi- & Pollock 2007). Wherever possible, training and technology vidual animals. (e.g. laser rangefinders) should be used to ensure adequate A second common form, particularly for surveys of breeding accuracy. Provided distance measurements are approximately songbirds, is point transect sampling, where the design is based unbiased, bias in line transect estimates tends to be small in the on randomly placed points rather than lines. presence of measurement errors, but larger for estimates from Several other variations exist. In indirect surveys, animal point transect surveys (Buckland et al. 2001:264–265). In some signs are surveyed by one of the above methods, and sign den- line transect surveys, particularly shipboard surveys, direct ani- sity is converted to animal density using estimates of sign pro- mal-observer distance r is recorded together with sighting angle duction and decay rates (Marques et al. 2001). In cue count h from the transect line and perpendicular distance is then surveys (nearly) instantaneous cues are surveyed, e.g. whale calculated as r sin h. In this case, it is important to obtain blows (Hiby 1985) or bird songbursts (Buckland 2006), and accurate angles, particularly for small angles, and an angle the resulting estimate of number of cues per unit time per unit board can be used to help achieve this. In addition to exact distances, if animals occur in clusters, we assume cluster sizes area is converted to estimated animal density using an estimate are accurately recorded, at least for those close to the line or of the cue rate per animal. In trapping webs or trapping line point. We also assume species are not misidentified. transects (Lukacs, Franklin & Anderson 2004), a network of trapsisplacedaroundthe pointorline, andifananimalenters Other assumptions are made, but they are seldom of great a trap, its recorded distance is the distance of that trap from practical significance. We assume animal locations are the point or line. In trapping or lure point transect sampling, a independent of the positions of the lines or points, which we single trap or lure is placed at each point of the design, and the ensure if we have an adequate sample of lines or points, and probability of detecting a given animal is estimated by con- randomize their location. This assumption becomes critical if, ducting separate trials on animals with known location (Buck- land et al. 2006). for example, transects are placed along roads or tracks. We 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 7 also assume detections are independent events, but our analysis appropriate for the sampling used, and analysis options methods are very robust to failures of this assumption (except desired. Version 3.0 was a Microsoft Windows console appli- in the case of double-platform designs, where independence cation, but retained the command language structure. between duplicate detections of the same animal at zero dis- With funds from British research funding councils, a pro- tance is required). gramming team developed a version of Distance with fully integrated, Windows-based graphical user interface. This ver- sion, Distance 3.5, became generally available in 1998. Subse- DESIGN-BASED AND MODEL-BASED ESTIMATION quent versions saw the addition of more features: Distance 4 In the case of strip transect sampling, where all animals within (in 2002) the multiple-covariate distance sampling (MCDS) the strip of half-width w are assumed to be detected, estimation and automated survey design engines, Distance 5 (in 2005) the MRDS engine and Distance 6 (in 2009) the density surface of abundance within the survey region can be achieved using an entirely design-based framework. To do this successfully, it is modelling (DSM) engine. The basic methods in Distance 6 are critical to place the strips at random throughout the survey described in a third monograph (Buckland et al. 2001), which region, to ensure that we count representative strips. We can is essentially an updated version of the second one; the more then assume the density in the strips is an unbiased estimate of advanced methods are describedinanedited volume(Buck- densityin thewidersurveyregion;nomodelisneeded.Standard land et al. 2004), and in additional references given below. distance sampling also uses design-based inference to extrapo- Users downloading Distance are asked to register their email late from the sampled plots (strips for line transect sampling or address and country. Distance versions 3.5, 4 and 5 together circles for point transect sampling) to the survey region. have been registered by over 19 000 users from 135 countries. However, we do not know the true number of animals in the plots. We therefore fit a detection model, which allows us to Program structure and overview estimate this number. Standard distance sampling is thus a hybrid, blending model-based (within the plots) and design- From the users’ perspective, Distance consists of a graphical based (extrapolation from the plots) inference (Fewster & interface that allows users to enter, import and view data, Buckland2004).We couldadoptafullymodel-basedapproach. design surveys and run analyses. Users begin by creating a Dis- The simplest would be to assume that animals are uniformly tance project, which contains information about a single study. and independently distributed throughout the survey region. Wizardsare availabletohelpinsetting up aproject andenter- This leads to the same abundance estimate as for the hybrid ing data, or importing it from delimited text files. Data are approach, but estimates of precision would change. This strat- organized into nested layers: global (for data that relates to the egy is not usually adopted because the estimates of precision are whole study area), stratum (data relating to individual survey not robust to the failure of the model assumptions made about strata), sample (data relating to individual survey lines or the spatial distribution of animals. However, there is increasing points) and observation (data relating to single observations). interest in modelling how animal density varies spatially, and More complex nested structures are possible. Geographical fully model-based approaches that make more reasonable data, in the form of ESRI shapefiles, can also be associated assumptions are an active area of research (e.g. Hedley & with each layer. Having entered or imported data, users under- Buckland 2004; Johnson, Laake & VerHoef 2009). It is possible take one of two tasks: design of a new survey or analysis of to fit relatively simple spatial models in Distance 6 (see below). already-collected survey data (Fig. 1). A design is an algorithm for laying out samples within the study area; multiple designs can be created using the design Historical development engine in Distance, and their properties examined by simula- The Distance software evolved from two earlier software tion. A single realization of a design is called in Distance a developments. The first was the program TRANSECT survey, and this consists of the position of a set of sample lines (Laake, Burnham & Anderson 1979) for fitting Fourier series or points together with the survey methods (e.g. collection of and other models to line transect distance data. The methods perpendicular distances to clusters of animals). Line or point on which the software was based were developed in a series of positions can be readily exported from Distance and used for publications, culminating in the first monograph on distance navigation in the field. Results can also be viewed within sampling (Burnham, Anderson & Laake 1980). The second Distance in the form of simple maps and text output contain- development was of an algorithm for maximum likelihood fit- ing summary statistics. ting of models to line or point transect distance data, based on Analysis in Distance involves combining three elements: (i) a a parametric key function multiplied by series adjustments survey, which specifies which data layers to use and the survey (Buckland 1992). Code implementing this algorithm was methods used; (ii) a data filter, which allows subsets of the data merged with TRANSECT to create Distance (Laake et al. to be selected, truncation distances to be chosen and other pre- 1993), which provided analysis of line and point transect data. processing; and (iii) a model definition, which specifies how the The methods were comprehensively documented in a second data should be analysed. These are then run using one of the monograph (Buckland et al. 1993). Distance versions 1.0–2.2 four analysis engines available in Distance: CDS, MCDS, were DOS-based applications that were controlled using a MRDS and DSM. Each has different capabilities, as explained command language to invoke various program options below. Results are available withinDistanceinthe form of 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 8 L. Thomas et al. probability. We recommend that a systematic survey design (a) Survey design Data layers GIS data with a random start be used to afford better spatial coverage Global Design and lower variance. For such a design, Fewster et al. (2009) Stratum describe methods to estimate this variance with low bias, and Survey Sample these methods are available in the Distance analysis engines. Coverage grid The layout of transects across the study region deserves careful thought. Parallel, equally spaced transects with a ran- (b) dom start provide designs with uniform coverage. However, Analysis GIS data Data layers if survey platform cost is high, then not collecting data while Global moving between transects can be wasteful. Hence, sawtooth Stratum Survey or zigzag designs can be employed; however, when study Sample Data filter regions are non-rectangular, these designs can produce Observation unequal coverage probability (Strindberg & Buckland 2004). Analysis Prediction grid DSM All else being equal, more, shorter transect lines yield more Model definition Engine = Optional precise estimates of the encounter rate variance than do a Fig. 1. Schematic showing (a) survey design and (b) analysis in few long lines; segmented transects are often used, where the Distance. distance between sections of survey effort along a transect is roughly equal to the separation between successive parallel diagnostic plots and summary statistics. These are readily transects (Buckland et al. 2004:204). Where there are known exported to other software. density gradients within the study region, stratification can Conceptually, Distance projects contain all the data and be used to reduce variance; alternatively (or in addition) results relating to a single study. Physically, a project comprises transect lines can be placed parallel to this gradient. In a project file and an associated data folder; the latter contains a highly complex study regions, the ratio of study area perime- data file, geographical shapefiles and a folder containing files ter to area may be quite high. Then, edge effects can cause generated by analysis engines that use the statistical software significantly lower coverage probability near the perimeter of R. As a project consists of many parts, Distance provides a the study area, so that sampling into a buffer zone (‘plus convenient mechanism for packing the project into a single sampling’) is advisable (Strindberg, Buckland & Thomas (zip) file to make it easy to archive and transfer. 2004:192–194, 200–201). Further discussion of design issues A full electronic user manual comes with the software, and is given by Buckland et al. (2001:228–317), Strindberg et al. there is an email-based discussion list for users (http://www. (2004) and Thomas, Williams & Sandilands (2007). jiscmail.ac.uk/distance-sampling). In all of these situations, it is advisable to employ the auto- From the programming perspective, the visual interface, mated survey design engine in Distance to examine the cover- written in Microsoft Visual Basic (Microsoft Corporation age properties of candidate survey designs prior to their 2000), is highly modular, and runs the analysis engines in sepa- implementation. For a given design, Distance can generate a rate processes to enhance stability and make use of multi-core map showing coverage probabilities estimated by simulation, hardware. The survey design engine is also written in Visual to allow users to determine whether standard analyses in Dis- Basic, using ESRI’s MapObjects library (ESRI 2004) for the tance are appropriate or whether other analysis options are GIS functionality. The CDS and MCDS analysis engines are preferable to avoid potential bias (for example, the Horvitz- written in FORTRAN (Compaq Computer Corporation 2001) Thompson estimator described below; Rexstad 2007). Other and the MRDS and DSM engines in R (R Development Core outputs include the minimum, mean and maximum number of Team 2009). For data storage, both project and data files are in survey lines or points, and distance travelled per stratum. Microsoft Access format. More details of the internal structure These can be useful in determining if a design is feasible, and of the software are given in appendices to the user manual. whether there is sufficient effort to produce enough sightings for reliable analysis. Once a design is selected, a realization (survey plan) can be generated, and sample coordinates Survey design exported for use in implementing the survey. As with any sampling exercise, obtaining reliable results from a distance sampling survey depends critically on good survey Estimating the detection function design. This relies upon the fundamental sampling principles of replication and randomization. Sufficient replicate lines or Version 6 of Distance has three different analysis engines for points ensure that variation in encounter rate (number of estimating the detection function. (The fourth engine is objects detected per unit survey effort) can be adequately esti- coveredinthenextsection.) mated. The lines or points should not be placed subjectively; rather a randomization scheme should be employed that gives THE CDS ENGINE all locations in the study region a known, non-zero probability of being covered by a transect (the ‘coverage probability’). The CDS engine is a FORTRAN program based on the code Standard analyses in Distance assume uniform coverage in earlier versions of Distance. CDS assumes that detection of 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 9 an animal on the line or point is certain. The same detection Other data analysis issues function is assumed to apply for all animals; this seems unreal- istic, but the ‘pooling robustness’ property of CDS estimators ESTIMATING ABUNDANCE ensures that moderate amounts of unmodelled heterogeneity cause little bias (Buckland et al. 2004:389–392). The CDS Consider first the case that detections are of single animals. engine implements the flexible semi-parametric detection func- Estimated abundance (N) may be formulated in terms of a tion modelling framework proposed by Buckland (1992), Horvitz-Thompson estimator, but with the inclusion probabil- whereaparametric keyfunctionispairedwithzeroormore ities estimated (Borchers & Burnham 2004): series adjustment terms. Four key functions are available: uni- form, half-normal, hazard-rate and negative exponential. ^ N ¼ eqn 1 Adjustments can be cosine terms, or Hermite or simple polyno- i¼1 mials. Selection of the appropriate combination can be done where P is the estimated inclusion probability for animal using standard model selection techniques (see Analysis hints, i and n is the number of observations. P has two compo- below). nents: first the probability that animal i falls within the sampled plots (the ‘coverage probability’ previously intro- THEMCDSENGINE duced) and secondly an estimate of its probability of detection, given that it is within the plots. TheMCDSengineisanextension of theCDS FORTRAN When animals occur in clusters, we can estimate abundance program that allows inclusion of covariates other than distance as ): from the line or point in the detection function (Marques & Buckland 2003, 2004). This is useful in four circumstances N ¼ eqn 2 (Marques et al. 2007): first, when we wish to estimate density i¼1 for a subset of the data (e.g. a stratum), but there are too few where s is the size of cluster i, i = 1, ..., n. Alternatively, observations to fit aseparatedetectionfunctiontoeachsubset; we can multiply estimated cluster abundance by an esti- secondly, when pooling robustness does not hold (e.g. too mate EðsÞ of mean cluster size in the population: much heterogeneity in detection probability); thirdly, because it can reduce the variance of the density estimate; and fourthly, if the covariate distribution is of interest in its own right. Only ^ ^ N ¼ EðsÞ eqn 3 two key functions are allowed: the half-normal and the i¼1 hazard-rate. Both of these have a scale parameter, which is modelled as a function of the covariates. The covariates may If the CDS engine is selected, the detection function is relate to the individual detections (e.g. cluster size or animal assumed to be the same for all detections, so that eqn 1 sim- ^ ^ behaviour), the observer (e.g. observer ID) or the environment plifies to N ¼ n P. For clustered populations, the CDS (e.g. habitat or weather), and can be either continuous covari- engine uses a simplification of eqn 3. The default method for ates or qualitative factors. estimating mean cluster size is the regression method of Buckland et al. (2001:73–75) in which log cluster size is regressed on estimated probability of detection. This is THEMRDSENGINE designed to remove any effect of ‘size bias’, which occurs The MRDS engine is an R package for use primarily with dou- when larger clusters are easier to detect than small ones at ble-platform line transect data, where the assumption of cer- large distances, so the simple mean of observed cluster sizes tain detection on the line can be relaxed (Laake & Borchers is a positively biased estimate of population mean cluster 2004). Double-platform methods are widely used in both aerial size. It also corrects for bias that arises when cluster size and shipboard surveys of marine mammals (e.g. Borchers et al. tends to be underestimated at large distances, so mean 2006), but are potentially useful in many situations where observed cluster size is a negatively biased estimate of popu- objects at zero distance are difficult to detect. Users wishing lation mean cluster size. to runthisengine needtohaveRinstalledinadditionto The MCDS and MRDS engines allow the detection func- Distance. As with the MCDS engine, covariates can be incor- tion (but not coverage probability) to vary, so that eqn 1 porated into the detection function model; however, inclusion applies when detections are of single animals. Equation 2 is of adjustment functions is not supported at present. Every used for clustered populations. attempt should be made to include all covariates that have a large effect on detectability, because unlike CDS and MCDS, ESTIMATING PRECISION estimation is not robust to the effects of unmodelled heteroge- neity at zero distance when detection on the line is not certain. For most analyses, the default method for estimating precision Single platform surveys can also be analysed using the is an analytical one. However, a nonparametric bootstrap is MRDS engine, but this is only really useful when calling the available. The default option for the bootstrap is to resample engine from R (where the CDS and MCDS engines are not lines or points. In some circumstances, the user may wish to readily available). resample strata, for example in point transect sampling, where 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 10 L. Thomas et al. a grid of points is placed at each of a number of random loca- plished using Distance. The overall density would then be esti- tions (called ‘cluster sampling’), where the grid is the appropri- mated as the sum of the stratum-specific estimates. ate unit to resample. This is achieved by defining each grid of A current limitation of Distance is that it can only handle points as a stratum, and resampling strata. The user can also one level of stratification. opt to resample individual detections, although this is not rec- ommended. Multi-level bootstrapping is also allowed but is ANALYSIS HINTS not recommended as resampling by line or point gives a better representation of the variability induced by the sampling pro- There are typically three phases in analysing data in Distance: cess (Davison & Hinkley 1997:100–102). exploratory data analysis, followed by model selection, and For the CDS engine, the analytical variance of a density then final analysis and inference. We focus here on CDS analy- or abundance estimate is estimated by the delta method ses; suggestions for MCDS analyses were given by Marques (Buckland et al. 2001:52), and comprises three components, et al. (2007). corresponding to estimation of encounter rate, the detection function and mean cluster size in the population (for clus- Exploratory data analysis tered populations). For details of how the three components are obtained and combined, see Buckland et al. (2001:76– Initially, exploratory data analysis is carried out to aid under- 79). However, the formulae for estimating encounter rate standing of the data and identify any problems. This phase variance given by Buckland et al. (2001:78–79) are not the should be started while the data are being collected, as this default option in Distance version 6, following work by Few- allows any problems with data collection to be identified and ster et al. (2009) showing an alternative estimator gives more rectified. If exact distances are recorded (rather than grouped robust estimates of variance when there are strong spatial or interval distance data), it is useful to plot histograms of the trends through the survey region. By default, the estimators distances with many cutpoints. assume lines or points were laid down at random. This leads In Fig. 2, we show examples of problematic line transect to overestimates of variance where systematic designs are data sets. Figure 2a shows an example of ‘spiked’ data. For used. For systematic parallel designs, estimators based on such data, different models will give very different estimates of post-stratification (Fewster et al. 2009) are available, and density, so it is important to understand what has caused the these produce more reliable (and usually lower) estimates for spike, and to modify field procedures accordingly. A common that design. cause in shipboard surveys is inaccurate estimation of sighting For the MCDS and MRDS engines, detection probability angles for detections ahead of the vessel. With inadequate is allowed to depend on covariates other than distance, and a training and ⁄ or aids, observers often record most detections different, more integrated approach to variance estimation is within perhaps 10 of the line as 0, leading to rounding of required. For the MCDS engine, see Marques & Buckland many perpendicular distances to zero. (2003, 2004:38–43) and Marques et al. (2007); for the MRDS Spiked data might also arise if animals are attracted towards engine, see Borchers et al. (2006). the observer. It is important that detections are made before any responsive movement occurs. Spiked data may arise even when there has been no failure STRATIFICATION (INCLUDING POST-STRATIFICATION) of an assumption. For example, in surveys of breeding Geographical stratification can be used to improve precision of songbirds, singing males may be much more detectable than estimates by subdividing the study region into blocks that are (non-singing and cryptic) females. In that case, the spike likely to be similar in animal density. Stratification can also be arises because females are only detectable close to the line. used when there is management interest in estimating density The simplest solution in this case is to additionally record in sub-sections of the study region. The overall estimate of den- whether the bird was singing. An analysis can then be sity is obtained as the mean of the stratum-specific estimates, conducted for singing birds, allowing estimation of the weightedbytherespectiveareasofthestrata. number of territories. If females are certain to be detected If the same study area is surveyed repeatedly, then survey- when on the line, then a separate analysis of females could be level strata could be defined. If a study area is surveyed by say conducted, if sample size is adequate, or sex could be included twoships, ananalysiswithships as strata canbeperformed. as a covariate in an MCDS analysis. Similar issues apply for In this latter case, the overall estimate of density would be point transect surveys. the mean of stratum-specific density estimates, weighted by Figure 2b gives clear evidence that at least one assumption the effort carried out by each ship. has failed. Aerial survey data can look like this, because it may In some cases, strata can be defined using criteria not avail- be difficult for observers to see the line, so that animals close to able during survey design. For example, it may be of scientific the line are missed. Solutions include aircraft with bubble win- interest to produce sex-specific estimates of density in the study dows, allowing the line to be seen, or offsetting the line, with area if the animals can be identified by gender. However, if the markers on window and wing strut, which, when aligned, allow genders mix freely within the study area, the survey cannot be the observer to record accurately which side of the line an ani- designed to account for sex-specific estimation. This type of mal ison. Animalsclosertothe path of theaircraftthanthe stratification is called post-stratification, and can be accom- line are not included in the analysis. 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 11 (a) (b) 0 10 20 30 40 0 10 20 30 40 Distance from line (m) Distance from line (m) (c) (d) 0 5 10 15 20 0 1020 3040 Distance from line (m) Distance from line (m) Fig. 2. Examples of problematic line transect data sets: (a) spike at zero, (b) too few detections near zero, (c) rounding to favoured distances, (d) overdispersed data. Another possible cause of this pattern of observed distances Information Criterion (AIC) and goodness-of-fit statistics are is animal movement away from the line before detection. In invalidated by the failure of independence. It may be better to this case, attempts should be made to detect animals sooner, analyse clusters for model selection, then having selected a e.g. by searching ahead instead of to the side in aerial surveys, model, fit it to data for individuals for estimating abundance. or by searching with binoculars in shipboard surveys. For The same problem arises when analysing cue count data, as surveys of terrestrial mammals, nocturnal surveys using a multiple cues from the same animal may all be at similar dis- thermal imager can be effective. tances, especially when cue counting is conducted from points, Figure 2c shows considerable variability in the frequency rather than along lines (Buckland 2006). counts. In this case, high frequency counts correspond to inter- In Fig. 3a, we show a quantile–quantile (q–q) plot corre- vals containing distances that are a multiple of 10. This is sponding to the fit of a half-normal detection function model caused by rounding of estimated distances. Better observer to distances from the line for a line transect survey. If the model training, together with aids to estimation (e.g. laser rangefind- is good, we expect to see approximately a straight line. This ers for terrestrial surveys or reticles for shipboard surveys), can plot shows no systematic curvature, but has ‘steps’ – a clear usually minimize this problem. Given sufficient data, rounding indication that distances have been rounded. When distances does not usually compromise estimation unless there is exces- are analysed as exact (as distinct from grouped), Distance gen- sive rounding to distance zero (see above). However, judicious erates q–q plots; these can be useful for diagnosing problems choice of cutpoints is needed for testing v goodness-of-fit, so with the data (as here) or poor model fit (next section). that most rounded distances remain in their correct distance interval. Model selection Figure 2d is similar to Fig. 2c, except that the large frequen- cies do not occur at any obvious values to which distances The second phase of analysis is model selection. Included in might be rounded. Data like these indicate over-dispersion, this phase is selection of a suitable truncation distance w for and may occur for example if animals occur in clusters, but are the distance data. We truncate because otherwise extra adjust- recorded as individuals. This can occur when it is not easy to ment terms may be needed to fit a long tail to the detection locate the centre of a cluster of animals (a common problem function. This reduces precision for little gain, as data a long with primates), or to detect all animals in a cluster; in such cir- way from the line or point contribute little to the abundance cumstances, a recommended field protocol is to record each estimate (Buckland et al. 2001:103–108, 151–153). We typi- detected animal separately. This violates the independence cally truncate around 5% of distances for line transect sam- assumption, but estimation is remarkably robust to even gross pling, and more for point transect sampling (for which a higher violations of this assumption. However, model selection is proportion of detections corresponds to the tail of the detec- more problematic, because the usual tools such as Akaike tion function, Buckland et al. 2001:151). If grouped distance 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 Number of detections Number of detections 0 2 4 6 8 10 12 0 10 20 30 40 Number of detections Number of detections 0 5 10 15 20 25 0 10 20 30 40 12 L. Thomas et al. Having fitted several models, visual assessment of model fit (a) can be performed by examining histograms. For example, the hazard-rate model can fit implausible shapes for some data sets, especially for spiked data and for some point transect data sets. There may therefore be reasons to reject that model even if it fits the data well, for example because the estimated proba- bility of detection falls off more quickly with distance than is consistent with how the observer searches. For those models that give a reasonable fit, compare the goodness-of-fit mea- sures. Distance provides v goodness-of-fit tests. If exact dis- 0·0 0·2 0·4 0·6 0·8 1·0 tances are recorded, it also gives test statistics for the Empirical distribution function Kolmogorov–Smirnov and Crame´ r-von Mises tests and a q–q (b) plot (Buckland et al. 2004:385–389). Figure 3b is an example of where the model (the half-normal in this case) provides a poor fit to the data, as can be seen by the departure from a straight line. These data have too few observations close to the line relative to mid-distances to be well modelled by a half-nor- mal; a model with a flatter ‘shoulder’ to the detection function is needed. The AIC provides a relative measure of fit. The model with the smallest AIC provides, in some sense, the best fit to the data. AIC values are only comparable if they are applied to 0·0 0·2 0·4 0·6 0·8 1·0 Empirical distribution function exactlythe same data–inDistance, this meansthatrunsmade using the same survey and data filter are comparable. For such Fig. 3. Quantile–quantile (q–q) plots corresponding to fits of a half- sets, Distance provides the DAIC values, which are AIC values normal model to line transect data. (a) The model fit seems satisfac- with the AIC of the best-fitting model subtracted. Thus tory, although there is clear evidence of rounding in the observations. (b) These data show evidence of too few detections close to the line, DAIC = 0 for the best model. Other model selection criteria relative to what would be expected under the half-normal model. are also available. data are collected, choice of w is restricted to the cutpoints Final analysis and inference defining the intervals. Having selected w, cutpoints should be set for the distance The third phase of analysis is to select the best model, and data. If data are recorded in intervals, the cutpoints will be extract summary analyses and plots for reporting. If choice of predetermined. If data are recorded as ‘exact’, but in fact are model is uncertain and influential, an analysis in which more subject to substantial rounding, there may be merit in assigning than one model is selected can be run, and the option to esti- the distances to intervals for analysis, where cutpoints are mate the variance by bootstrap selected. For each bootstrap defined well away from favoured rounding distances, so that resample, the best model will be selected (using AIC by few observations will be recorded in the wrong interval. This is default), so that different models may be selected for the analy- achieved by setting cutpoints in the data filter of Distance. sis of different resamples. Resulting variances and confidence More usually, we will wish to analyse the data as exact (even if intervals then reflect model uncertainty. An example is given there is rounding, provided it is not severe), but set cutpoints by Williams & Thomas (2009). for presenting histograms and conducting v goodness-of-fit tests. This is achieved by setting cutpoints in the diagnostics More advanced analysis options section of the detection function model definition. When selecting a suitable model, it is worth bearing in mind MULTIPLIERS that it is only an approximation to the true detection function. There is little point in throwing every possible model at the Multipliers provide a simple means of extending standard dis- data – this risks over-fitting. If the data are of high quality, tance sampling methods. They may be added in Distance via many possible model and adjustment combinations will give the project set-up wizard, or later in the multipliers section of very similar estimates. In our experience, the following combi- the model definition. nations often perform well and there is rarely any need to try Indirect surveys of animal sign are often conducted, e.g. others: uniform key with cosine adjustments; half-normal key dung surveysofdeer orelephants,ornestsurveys of apes.Sign with cosine adjustments; half-normal key with Hermite poly- densityisconverted to animal densitybydividingbyanesti- nomial adjustments; hazard-rate key with simple polynomial mate of the sign production rate per animal, and an estimate of adjustments. We would never recommend using the negative themeantimetodecay of thesign. Theseestimates canbe exponential key, which is present in Distance largely for histor- added as multipliers, with the divide operator option (hence ical reasons. they are actually ‘dividers’), together with estimates of their 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Fitted cumulative distribution function Fitted cumulative distribution function 0·0 0·2 0·4 0·6 0·8 1·0 0·0 0·2 0·4 0·6 0·8 1·0 Distance software for density estimation 13 standard errors. If the degrees of freedom associated with the model are attached to each segment. Once a density surface estimated standard error are known, they may also be added. model has been built, density or abundance can be estimated Cues are instantaneous, or at least very short-lived, signs, over any area of interest within the study area by predicting such as a whale blow or a songburst. Point transect methods over a grid of points to which the same covariates are attached. may be used to estimate the number of cues per unit area per To build this grid, Distance requires that the global data layer unit time, and this may be converted to estimated animal den- be associated with a shapefile. sity by entering a divider equal to the estimated number of cues per unit of time per animal. For whale cue count surveys ACCESSING ANALYSIS ENGINES FROM OTHER (Buckland et al. 2001:191–197), only a sector of the full circle SOFTWARE is surveyed; the fraction of the circle surveyed may be entered as an additional divider, but in this case as it is a known con- Sometimes analyses are required that are too complex to stant no standard error would be entered. carry out within Distance. In this case, the graphical user For trapping and lure point transect sampling (Buckland interface of Distance can be circumvented. For the CDS and et al. 2006), the detection function is estimated by setting up MCDS engines, data and descriptions of the models are trials with animals at known locations. We record whether or passed to the FORTRAN program via a data file and a not each trial results in detection of the animal, and use logistic command file. Results of an analysis are placed into a statis- regression to estimate the detection function (in general, with tics (‘stats’) file, which can be read by software written by a probability of detection at the point allowed to be less than researcher to extract useful parameter estimates for further unity). This allows the effective area covered around each point analysis. Bootstrapping, for example, can be accomplished to be estimated, and counts of animals from the main survey by resampling the data, and rewriting the data file presented can be converted to estimated animal density by dividing by to MCDS. This process is somewhat streamlined for the effective area, by setting up the appropriate multiplier in researchers familiar with R, using the MRDS engine. With Distance. Similarly, if too few detections are made in a distance this approach, data are read into R only once, and the re- sampling survey to allow reliable estimation of the detection sampling and accumulation of parameter estimates are all function, but an estimate is available from another survey that conducted within R without the use of intermediate text files. is considered appropriate, counts can be converted to estimates Likewise, the DSM engine can be accessed directly from of animal density in the same way. within R. The command languages of all four engines are documented in an appendix to the Users’ Guide (CDS and MCDS) and R help files (MRDS and DSM). THE DSM ENGINE If transects are not positioned according to a random design, Future plans design-based extrapolation of densities to the wider region may be unreliable. Even if a randomized design is used, we Theoretical developments in distance sampling continue to may wish to model animal density as a function of spatially occur, and we endeavour to incorporate these into Distance. indexed environmental covariates – so called ‘spatial model- The most recent enhancements include the DSM engine and ling’ or ‘habitat modelling’. This is also useful for estimating the improved estimator of encounter rate variance of Fewster abundance in small regions of the study area, for which there et al. (2009). In future, we hope to incorporate a simulation is inadequate sampling effort to produce a stand-alone esti- engine into Distance, so practitioners can more readily exam- mate. ine the behaviour of distance sampling estimators for their par- The DSM analysis engine implements the ‘count method’ of ticular situation. Other enhancements we hope to make Hedley & Buckland (2004), in which the segment counts (seg- include: advances in estimating the effects of treatments (in the ments having been defined outside Distance) are modelled as a sense of designed experiments) that are relevant to many function of covariates such as habitat type, altitude or bottom impact assessment studies (Buckland et al. 2009); assessment depth, distance from human access, land-use type, latitude and of time trends in abundance or density from repeated surveys longitude. This is commonly done using generalized additive (Thomas, Burnham & Buckland 2004); and unequal coverage models (GAM) (Wood 2006) with overdispersed Poisson error estimators (Rexstad 2007). structure and a log link, with effective area of the segment There are some challenges associated with modelling density (defined as actual area multiplied by the estimated proportion surfaces, including variance estimation associated with the two of animals counted in the segment) serving as an offset. Other stages of the modelling process, autocorrelation in the counts, modelling strategies for DSM are also available in Distance. potential for unreasonable extrapolation of the density surface, The counts within each segment can be converted to estimates and ‘bleeding’ of abundance estimates to areas spatially proxi- of abundance within each segment, and the area of the segment mate but separated by adverse topography. Subsequent ver- (out to truncation distance w) is the offset. Alternatively, esti- sions of Distance may incorporate the refinements developed mated density can be used as the response variable, no offset, by Wood, Bravington & Hedley (2008), which makes substan- and the area of the segment used as a weight. tial progress in tackling the latter two issues. To use this engine, Distance requires that transect lines are The field of distance sampling is dynamic and growing. Con- divided into segments and that covariates to be included in the sequently, we anticipate that the software will also continue to 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 14 L. Thomas et al. Hedley, S.L. & Buckland, S.T. (2004) Spatial models for line transect sampling. evolve, to address more complex ecological applications and Journal of Agricultural, Biological and Environmental Statistics, 9, 181–199. make use of further statistical developments. Hiby, A.R. (1985) An approach to estimating population densities of great whales from sighting surveys. IMA Journal of Mathematics Applied in Medi- cine and Biology, 2, 201–220. Acknowledgements Johnson, D., Laake, J. & VerHoef, J. (2009) A model-based approach for mak- ing ecological inference from distance sampling data. Biometrics,DOI: We are grateful to the organizations that have funded the development of Dis- 10.1111/j.1541-0420.2009.01265.x. tance, an up-to-date list of whom can be found on the software web pages. The Laake, J.L. & Borchers, D.L. (2004) Methods for incomplete detection at dis- creation and maintenance of the software is a large, collaborative project; in tance zero. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, addition to the authors of this paper, contributions have been made by David K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 108–189. Anderson, David Borchers, Louise Burt, Julian Derry, Rachel Fewster, Fer- Oxford University Press, Oxford. nanda Marques, David Miller and John Pollard. We thank Stuart Newson, an Laake, J.L., Burnham, K.P. & Anderson, D.R. (1979) User’s Manual for Pro- anonymous reviewer and E.J. Milner-Gulland for their helpful comments on gram TRANSECT. Utah State University Press, Logan, UT. an earlier draft. Laake, J.L., Buckland, S.T., Anderson, D.R. & Burnham, K.P. (1993) DIS- TANCE User’s Guide V2.0. Colorado Cooperative Fish and Wildlife Research Unit, Colorado State University, Fort Collins, CO, 72 pp. Lukacs, P.M., Franklin, A.B. & Anderson, D.R. (2004) Passive approaches to References detection in distance sampling. Advanced Distance Sampling (eds S.T. Buck- Alldredge, M.W., Simons, T.R. & Pollock, K.H. (2007) A field evaluation of land, D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers & L. Tho- distance measurement error in auditory avian point count surveys. Journal mas), pp. 260–280. Oxford University Press, Oxford. of Wildlife Management, 71, 2759–2766. Marques, F.F.C. & Buckland, S.T. (2003) Incorporating covariates into stan- Borchers, D.L. & Burnham, K.P. (2004) General formulation for distance sam- dard line transect analyses. Biometrics, 59, 924–935. pling. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, Marques, F.F.C. & Buckland, S.T. (2004) Covariate models for the detection K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 6–30. Oxford function. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, University Press, Oxford. K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 31–47. Oxford Borchers, D.L., Laake, J.L., Southwell, C. & Paxton, C.G.M. (2006) Accom- University Press, Oxford. modating unmodeled heterogeneity in double-observer distance sampling Marques, F.F.C., Buckland, S.T., Goffin, D., Dixon, C.E., Borchers, D.L., surveys. Biometrics, 62, 372–378. Mayle, B.A. & Peace, A.J. (2001) Estimating deer abundance from line tran- Buckland, S.T. (1992) Fitting density functions using polynomials. Applied Sta- sect surveys of dung: sika deer in southern Scotland. Journal of Applied Ecol- tistics, 41,63. ogy, 38, 349–363. Buckland, S.T. (2006) Point transect surveys for songbirds: robust methodolo- Marques, T.A., Thomas, L., Fancy, S.G. & Buckland, S.T. (2007) Improving gies. The Auk, 123, 345–357. estimates of bird density using multiple covariate distance sampling. The Buckland, S.T., Anderson, D.R., Burnham, K.P. & Laake, J.L. (1993) Distance Auk, 127, 1229–1243. Sampling: Estimating Abundance of Biological Populations. Chapman & Microsoft Corporation (2000) Visual Basic 6. Microsoft Corporation, Red- Hall, London. mond, Washington, USA. Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. R Development Core Team (2009) R: A Language and Environment for Statisti- & Thomas, L. (2001) Introduction to Distance Sampling. Oxford University cal Computing. R Foundation for Statistical Computing, Vienna. ISBN Press, Oxford. 3-900051-07-0. http://www.R-project.org. Buckland, S.T., Anderson, D.R., Burnham K.P., Laake, J.L., Borchers, D.L. Rexstad, E. (2007) Non-Uniform Coverage Estimators for Distance Sampling. & Thomas L (eds) (2004) Advanced Distance Sampling. Oxford University Technical Report 2007-1. Centre for Research into Ecological and Environ- Press, Oxford. mental Modelling, St.AndrewsUniversity. http://hdl.handle.net/10023/628/. Buckland, S.T., Summers, R.W., Borchers, D.L. & Thomas, L. (2006) Point Strindberg, S. & Buckland, S.T. (2004) Zigzag survey designs in line transect transect sampling with traps or lures. Journal of Applied Ecology, 43,377– sampling. Journal of Agricultural, Biological and Environmental Statistics, 9, 443–461. Buckland, S.T., Russell, R.E., Dickson, B.G., Saab, V.A., Gorman, D.G. & Strindberg, S., Buckland, S.T. & Thomas, L. (2004) Design of distance sam- Block, W.M. 2009. Analysing designed experiments in distance sampling. pling surveys and Geographic Information Systems. Advanced Distance Journal of Agricultural, Biological and Environmental Statistics,DOI: Sampling (eds S.T. Buckland, D.R. Anderson, K.P. Burnham, J.L. Laake, 10.1198/jabes.2009.08030. D.L. Borchers & L. Thomas), pp. 190–228. Oxford University Press, Burnham, K.P., Anderson, D.R. & Laake, J.L. (1980) Estimation of density Oxford. from line transect sampling of biological populations. Ecological Mono- Thomas, L., Burnham, K.P. & Buckland, S.T. (2004) Temporal inferences from graphs, 72, 1–202. distance sampling surveys. Advanced Distance Sampling (eds S.T. Buckland, Compaq Computer Corporation (2001) Compaq Visual Fortran. Version 6.6. D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), Compaq Computer Corporation, Houston, Texas, USA. pp. 71–107. Oxford University Press, Oxford. Davison, A.C. & Hinkley, D.V. (1997) Bootstrap Methods and Their Applica- Thomas, L., Williams, R. & Sandilands, D. (2007) Designing line transect sur- tion. Cambridge University Press, Cambridge, UK. veys for complex survey regions. Journal of Cetacean Research and Manage- ESRI, Inc. (2004) MapObjects 2.3. Environmental Systems Research, Institute ment, 9, 1–13. Inc., Redlands, CA, USA. Williams, R. & Thomas, L. (2009) Cost-effective abundance estimation of rare Fewster, R.M. & Buckland, S.T. (2004) Assessment of distance sampling esti- marine animals: small-boat surveys for killer whales in British Columbia, mators. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, Canada. Biological Conservation, 142, 1542–1547. K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 281–306. Wood, S. N. (2006) Generalized Additive Models: An Introduction with R.Chap- Oxford University Press, Oxford. man & Hall, Boca Raton, FL. Fewster, R.M., Southwell, C., Borchers, D.L., Buckland, S.T. & Pople, A.R. Wood, S.N., Bravington, M.V. & Hedley, S.L. (2008) Soap film smoothing. (2008) The influence of animal mobility on the assumption of uniform dis- Journal of the Royal Statistical Society B, 70, 931–955. tances in aerial line transect surveys. Wildlife Research, 35, 275–288. Fewster, R.M., Buckland, S.T., Burnham, K.P., Borchers, D.L., Jupp, P.E., Received 29 July 2009; accepted 21 October 2009 Laake, J.L. & Thomas, L. (2009) Estimating the encounter rate variance in Handling Editor: E.J. Milner-Gulland distance sampling. Biometrics, 65, 225–236. 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Journal of Applied Ecology Pubmed Central

Distance software: design and analysis of distance sampling surveys for estimating population size

Loading next page...
 
/lp/pubmed-central/distance-software-design-and-analysis-of-distance-sampling-surveys-for-tLQ531JnFm

References (42)

Publisher
Pubmed Central
Copyright
© 2009 The Authors. Journal compilation © 2009 British Ecological Society
ISSN
0021-8901
eISSN
1365-2664
DOI
10.1111/j.1365-2664.2009.01737.x
Publisher site
See Article on Publisher Site

Abstract

Journal of Applied Ecology 2010, 47, 5–14 doi: 10.1111/j.1365-2664.2009.01737.x REVIEW Distance software: design and analysis of distance sampling surveys for estimating population size ,1 2 1 3 Len Thomas* , Stephen T. Buckland , Eric A. Rexstad , Jeff L. Laake , 4 2 1 1 Samantha Strindberg , Sharon L. Hedley , Jon R.B. Bishop , Tiago A. Marques and Kenneth P. Burnham Research Unit for Wildlife Population Assessment, Centre for Research into Ecological and Environmental Modelling, University of St. Andrews, St. Andrews KY16 9LZ, UK; Centre for Research into Ecological and Environmental Modelling, University of St. Andrews, St. Andrews KY16 9LZ, UK; National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service, 7600 Sand Point Way NE F ⁄ AKC3, Seattle, WA 4 5 98115 6349, USA; Wildlife Conservation Society, 2300 Southern Boulevard, Bronx, NY 10460, USA; and Colorado Cooperative Fish and Wildlife Research Unit, Department of Fish, Wildlife and Conservation Biology, Colorado State University, Fort Collins, CO 80523, USA Summary 1. Distance sampling is a widely used technique for estimating the size or density of biological populations. Many distance sampling designs and most analyses use the software Distance. 2. We briefly review distance sampling and its assumptions, outline the history, structure and capabilities of Distance, and provide hints on its use. 3. Good survey design is a crucial prerequisite for obtaining reliable results. Distance has a survey design engine, with a built-in geographic information system, that allows properties of different pro- posed designs to be examined via simulation, and survey plans to be generated. 4. A first step in analysis of distance sampling data is modelling the probability of detection. Distance contains three increasingly sophisticated analysis engines for this: conventional distance sampling, which models detection probability as a function of distance from the transect and assumes all objects at zero distance are detected; multiple-covariate distance sampling, which allows covariates in addition to distance; and mark–recapture distance sampling, which relaxes the assumption of certain detection at zero distance. 5. All three engines allow estimation of density or abundance, stratified if required, with associated measures of precision calculated either analytically or via the bootstrap. 6. Advanced analysis topics covered include the use of multipliers to allow analysis of indirect surveys (such as dung or nest surveys), the density surface modelling analysis engine for spatial and habitat modelling, and information about accessing the analysis engines directly from other software. 7. Synthesis and applications. Distance sampling is a key method for producing abundance and density estimates in challenging field conditions. The theory underlying the methods continues to expand to cope with realistic estimation situations. In step with theoretical developments, state- of-the-art software that implements these methods is described that makes the methods accessible to practising ecologists. Key-words: distance sampling, line transect sampling, point transect sampling, population abundance, population density, sighting surveys, survey design, wildlife surveys Introduction *Correspondence author. E-mail: len@mcs.st-and.ac.uk Distance sampling comprises a set of methods in which Re-use of this article is permitted in accordance with the Terms distances from a line or point to detections are recorded, and Conditions set out at http://www3.interscience.wiley.com/ authorresources/onlineopen.html from which the density and ⁄ or abundance of objects is 2009 The Authors. Journal compilation  2009 British Ecological Society 6 L. Thomas et al. estimated. Objects are usually animals or animal groups ASSUMPTIONS (termed clusters), but may be plantsorinanimate objects. Detections are usually of animalsorclusters, butmay be We briefly summarize the key assumptions of the basic method (for a more detailed discussion, see Buckland et al. 2001:29– of cues (such as whale blows or bird songbursts) or sign 37). Many of the recent advances of distance sampling allow (such as dung or nests). Conventional distance sampling one or more of these assumptions to be relaxed. There are just (CDS) methods are described by Buckland et al. (2001), three key assumptions. and various extensions are considered in Buckland et al. (2004). An extensive distance sampling reference list, covering 1. Objects on the line or point are detected with cer- both methodological developments and practical application tainty. Most surveys are conducted with a single observer, or of the methods, is available at http://www.ruwpa.st-and. a single observation ‘platform’ consisting of multiple observers ac.uk/distancesamplingreferences/. Most distance sampling surveys are analysed, and many are but with data pooled across them. In cases where it is impor- designed, using the software Distance (http://www.ruwpa. tant to relax assumption 1, double-observer or double-plat- st-and.ac.uk/distance/). In this paper, we describe version 6 of form surveys may be conducted (Laake & Borchers 2004). In the software and its capabilities, and give guidance on how to these, observers either search independently of each other or use it to design and analyse surveys. there may be ‘one-way’ independence, with one observer being unaware of detections made by the other, but not vice versa. Such methods are quite often used for marine mammal sur- What is distance sampling? veys. The mark–recapture distance sampling (MRDS) engine of Distance can be used to analyse such double-observer data. TYPES OF DISTANCE SAMPLING The most widely used form of distance sampling is line transect 2. Objects do not move. Conceptually, distance sampling is a sampling. A survey region is sampled by placing a number of ‘snapshot’ method: we would like to freeze animals in position lines at random in the region or, more commonly, a series of while we conduct the survey. In practice, non-responsive systematically spaced parallel lines with a random start point. movement in line transect surveys is not problematic provided it is slow relative to the speed of the observer. Non-responsive An observer travels along each line, recording any animals movement is more problematic for point transect surveys, detected within a distance w of the line. In the standard leading to overestimation of density (Buckland et al. method,weassumeall animals on the line are detected, but 2001:173). Responsive movement before detection is problem- detection probability decreases with increasing distance from atic because animals are assumed to be located independently the line. Hence, not all animals in the strip of half-width w need of the position of the line or point (see below); implications are to be detected. In addition, the distance of each detected ani- addressed by Fewster et al. (2008). mal from the line is recorded. We use the distribution of these distances to estimate the proportion of animals in the strip that is detected, which allows us to estimate animal density and 3. Measurements are exact. Untrained observers tend to be abundance. If animals occur in well-defined clusters (e.g. flocks poor at estimating distances by eye or ear (Alldredge, Simons or herds), then detections refer to clusters rather than to indi- & Pollock 2007). Wherever possible, training and technology vidual animals. (e.g. laser rangefinders) should be used to ensure adequate A second common form, particularly for surveys of breeding accuracy. Provided distance measurements are approximately songbirds, is point transect sampling, where the design is based unbiased, bias in line transect estimates tends to be small in the on randomly placed points rather than lines. presence of measurement errors, but larger for estimates from Several other variations exist. In indirect surveys, animal point transect surveys (Buckland et al. 2001:264–265). In some signs are surveyed by one of the above methods, and sign den- line transect surveys, particularly shipboard surveys, direct ani- sity is converted to animal density using estimates of sign pro- mal-observer distance r is recorded together with sighting angle duction and decay rates (Marques et al. 2001). In cue count h from the transect line and perpendicular distance is then surveys (nearly) instantaneous cues are surveyed, e.g. whale calculated as r sin h. In this case, it is important to obtain blows (Hiby 1985) or bird songbursts (Buckland 2006), and accurate angles, particularly for small angles, and an angle the resulting estimate of number of cues per unit time per unit board can be used to help achieve this. In addition to exact distances, if animals occur in clusters, we assume cluster sizes area is converted to estimated animal density using an estimate are accurately recorded, at least for those close to the line or of the cue rate per animal. In trapping webs or trapping line point. We also assume species are not misidentified. transects (Lukacs, Franklin & Anderson 2004), a network of trapsisplacedaroundthe pointorline, andifananimalenters Other assumptions are made, but they are seldom of great a trap, its recorded distance is the distance of that trap from practical significance. We assume animal locations are the point or line. In trapping or lure point transect sampling, a independent of the positions of the lines or points, which we single trap or lure is placed at each point of the design, and the ensure if we have an adequate sample of lines or points, and probability of detecting a given animal is estimated by con- randomize their location. This assumption becomes critical if, ducting separate trials on animals with known location (Buck- land et al. 2006). for example, transects are placed along roads or tracks. We 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 7 also assume detections are independent events, but our analysis appropriate for the sampling used, and analysis options methods are very robust to failures of this assumption (except desired. Version 3.0 was a Microsoft Windows console appli- in the case of double-platform designs, where independence cation, but retained the command language structure. between duplicate detections of the same animal at zero dis- With funds from British research funding councils, a pro- tance is required). gramming team developed a version of Distance with fully integrated, Windows-based graphical user interface. This ver- sion, Distance 3.5, became generally available in 1998. Subse- DESIGN-BASED AND MODEL-BASED ESTIMATION quent versions saw the addition of more features: Distance 4 In the case of strip transect sampling, where all animals within (in 2002) the multiple-covariate distance sampling (MCDS) the strip of half-width w are assumed to be detected, estimation and automated survey design engines, Distance 5 (in 2005) the MRDS engine and Distance 6 (in 2009) the density surface of abundance within the survey region can be achieved using an entirely design-based framework. To do this successfully, it is modelling (DSM) engine. The basic methods in Distance 6 are critical to place the strips at random throughout the survey described in a third monograph (Buckland et al. 2001), which region, to ensure that we count representative strips. We can is essentially an updated version of the second one; the more then assume the density in the strips is an unbiased estimate of advanced methods are describedinanedited volume(Buck- densityin thewidersurveyregion;nomodelisneeded.Standard land et al. 2004), and in additional references given below. distance sampling also uses design-based inference to extrapo- Users downloading Distance are asked to register their email late from the sampled plots (strips for line transect sampling or address and country. Distance versions 3.5, 4 and 5 together circles for point transect sampling) to the survey region. have been registered by over 19 000 users from 135 countries. However, we do not know the true number of animals in the plots. We therefore fit a detection model, which allows us to Program structure and overview estimate this number. Standard distance sampling is thus a hybrid, blending model-based (within the plots) and design- From the users’ perspective, Distance consists of a graphical based (extrapolation from the plots) inference (Fewster & interface that allows users to enter, import and view data, Buckland2004).We couldadoptafullymodel-basedapproach. design surveys and run analyses. Users begin by creating a Dis- The simplest would be to assume that animals are uniformly tance project, which contains information about a single study. and independently distributed throughout the survey region. Wizardsare availabletohelpinsetting up aproject andenter- This leads to the same abundance estimate as for the hybrid ing data, or importing it from delimited text files. Data are approach, but estimates of precision would change. This strat- organized into nested layers: global (for data that relates to the egy is not usually adopted because the estimates of precision are whole study area), stratum (data relating to individual survey not robust to the failure of the model assumptions made about strata), sample (data relating to individual survey lines or the spatial distribution of animals. However, there is increasing points) and observation (data relating to single observations). interest in modelling how animal density varies spatially, and More complex nested structures are possible. Geographical fully model-based approaches that make more reasonable data, in the form of ESRI shapefiles, can also be associated assumptions are an active area of research (e.g. Hedley & with each layer. Having entered or imported data, users under- Buckland 2004; Johnson, Laake & VerHoef 2009). It is possible take one of two tasks: design of a new survey or analysis of to fit relatively simple spatial models in Distance 6 (see below). already-collected survey data (Fig. 1). A design is an algorithm for laying out samples within the study area; multiple designs can be created using the design Historical development engine in Distance, and their properties examined by simula- The Distance software evolved from two earlier software tion. A single realization of a design is called in Distance a developments. The first was the program TRANSECT survey, and this consists of the position of a set of sample lines (Laake, Burnham & Anderson 1979) for fitting Fourier series or points together with the survey methods (e.g. collection of and other models to line transect distance data. The methods perpendicular distances to clusters of animals). Line or point on which the software was based were developed in a series of positions can be readily exported from Distance and used for publications, culminating in the first monograph on distance navigation in the field. Results can also be viewed within sampling (Burnham, Anderson & Laake 1980). The second Distance in the form of simple maps and text output contain- development was of an algorithm for maximum likelihood fit- ing summary statistics. ting of models to line or point transect distance data, based on Analysis in Distance involves combining three elements: (i) a a parametric key function multiplied by series adjustments survey, which specifies which data layers to use and the survey (Buckland 1992). Code implementing this algorithm was methods used; (ii) a data filter, which allows subsets of the data merged with TRANSECT to create Distance (Laake et al. to be selected, truncation distances to be chosen and other pre- 1993), which provided analysis of line and point transect data. processing; and (iii) a model definition, which specifies how the The methods were comprehensively documented in a second data should be analysed. These are then run using one of the monograph (Buckland et al. 1993). Distance versions 1.0–2.2 four analysis engines available in Distance: CDS, MCDS, were DOS-based applications that were controlled using a MRDS and DSM. Each has different capabilities, as explained command language to invoke various program options below. Results are available withinDistanceinthe form of 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 8 L. Thomas et al. probability. We recommend that a systematic survey design (a) Survey design Data layers GIS data with a random start be used to afford better spatial coverage Global Design and lower variance. For such a design, Fewster et al. (2009) Stratum describe methods to estimate this variance with low bias, and Survey Sample these methods are available in the Distance analysis engines. Coverage grid The layout of transects across the study region deserves careful thought. Parallel, equally spaced transects with a ran- (b) dom start provide designs with uniform coverage. However, Analysis GIS data Data layers if survey platform cost is high, then not collecting data while Global moving between transects can be wasteful. Hence, sawtooth Stratum Survey or zigzag designs can be employed; however, when study Sample Data filter regions are non-rectangular, these designs can produce Observation unequal coverage probability (Strindberg & Buckland 2004). Analysis Prediction grid DSM All else being equal, more, shorter transect lines yield more Model definition Engine = Optional precise estimates of the encounter rate variance than do a Fig. 1. Schematic showing (a) survey design and (b) analysis in few long lines; segmented transects are often used, where the Distance. distance between sections of survey effort along a transect is roughly equal to the separation between successive parallel diagnostic plots and summary statistics. These are readily transects (Buckland et al. 2004:204). Where there are known exported to other software. density gradients within the study region, stratification can Conceptually, Distance projects contain all the data and be used to reduce variance; alternatively (or in addition) results relating to a single study. Physically, a project comprises transect lines can be placed parallel to this gradient. In a project file and an associated data folder; the latter contains a highly complex study regions, the ratio of study area perime- data file, geographical shapefiles and a folder containing files ter to area may be quite high. Then, edge effects can cause generated by analysis engines that use the statistical software significantly lower coverage probability near the perimeter of R. As a project consists of many parts, Distance provides a the study area, so that sampling into a buffer zone (‘plus convenient mechanism for packing the project into a single sampling’) is advisable (Strindberg, Buckland & Thomas (zip) file to make it easy to archive and transfer. 2004:192–194, 200–201). Further discussion of design issues A full electronic user manual comes with the software, and is given by Buckland et al. (2001:228–317), Strindberg et al. there is an email-based discussion list for users (http://www. (2004) and Thomas, Williams & Sandilands (2007). jiscmail.ac.uk/distance-sampling). In all of these situations, it is advisable to employ the auto- From the programming perspective, the visual interface, mated survey design engine in Distance to examine the cover- written in Microsoft Visual Basic (Microsoft Corporation age properties of candidate survey designs prior to their 2000), is highly modular, and runs the analysis engines in sepa- implementation. For a given design, Distance can generate a rate processes to enhance stability and make use of multi-core map showing coverage probabilities estimated by simulation, hardware. The survey design engine is also written in Visual to allow users to determine whether standard analyses in Dis- Basic, using ESRI’s MapObjects library (ESRI 2004) for the tance are appropriate or whether other analysis options are GIS functionality. The CDS and MCDS analysis engines are preferable to avoid potential bias (for example, the Horvitz- written in FORTRAN (Compaq Computer Corporation 2001) Thompson estimator described below; Rexstad 2007). Other and the MRDS and DSM engines in R (R Development Core outputs include the minimum, mean and maximum number of Team 2009). For data storage, both project and data files are in survey lines or points, and distance travelled per stratum. Microsoft Access format. More details of the internal structure These can be useful in determining if a design is feasible, and of the software are given in appendices to the user manual. whether there is sufficient effort to produce enough sightings for reliable analysis. Once a design is selected, a realization (survey plan) can be generated, and sample coordinates Survey design exported for use in implementing the survey. As with any sampling exercise, obtaining reliable results from a distance sampling survey depends critically on good survey Estimating the detection function design. This relies upon the fundamental sampling principles of replication and randomization. Sufficient replicate lines or Version 6 of Distance has three different analysis engines for points ensure that variation in encounter rate (number of estimating the detection function. (The fourth engine is objects detected per unit survey effort) can be adequately esti- coveredinthenextsection.) mated. The lines or points should not be placed subjectively; rather a randomization scheme should be employed that gives THE CDS ENGINE all locations in the study region a known, non-zero probability of being covered by a transect (the ‘coverage probability’). The CDS engine is a FORTRAN program based on the code Standard analyses in Distance assume uniform coverage in earlier versions of Distance. CDS assumes that detection of 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 9 an animal on the line or point is certain. The same detection Other data analysis issues function is assumed to apply for all animals; this seems unreal- istic, but the ‘pooling robustness’ property of CDS estimators ESTIMATING ABUNDANCE ensures that moderate amounts of unmodelled heterogeneity cause little bias (Buckland et al. 2004:389–392). The CDS Consider first the case that detections are of single animals. engine implements the flexible semi-parametric detection func- Estimated abundance (N) may be formulated in terms of a tion modelling framework proposed by Buckland (1992), Horvitz-Thompson estimator, but with the inclusion probabil- whereaparametric keyfunctionispairedwithzeroormore ities estimated (Borchers & Burnham 2004): series adjustment terms. Four key functions are available: uni- form, half-normal, hazard-rate and negative exponential. ^ N ¼ eqn 1 Adjustments can be cosine terms, or Hermite or simple polyno- i¼1 mials. Selection of the appropriate combination can be done where P is the estimated inclusion probability for animal using standard model selection techniques (see Analysis hints, i and n is the number of observations. P has two compo- below). nents: first the probability that animal i falls within the sampled plots (the ‘coverage probability’ previously intro- THEMCDSENGINE duced) and secondly an estimate of its probability of detection, given that it is within the plots. TheMCDSengineisanextension of theCDS FORTRAN When animals occur in clusters, we can estimate abundance program that allows inclusion of covariates other than distance as ): from the line or point in the detection function (Marques & Buckland 2003, 2004). This is useful in four circumstances N ¼ eqn 2 (Marques et al. 2007): first, when we wish to estimate density i¼1 for a subset of the data (e.g. a stratum), but there are too few where s is the size of cluster i, i = 1, ..., n. Alternatively, observations to fit aseparatedetectionfunctiontoeachsubset; we can multiply estimated cluster abundance by an esti- secondly, when pooling robustness does not hold (e.g. too mate EðsÞ of mean cluster size in the population: much heterogeneity in detection probability); thirdly, because it can reduce the variance of the density estimate; and fourthly, if the covariate distribution is of interest in its own right. Only ^ ^ N ¼ EðsÞ eqn 3 two key functions are allowed: the half-normal and the i¼1 hazard-rate. Both of these have a scale parameter, which is modelled as a function of the covariates. The covariates may If the CDS engine is selected, the detection function is relate to the individual detections (e.g. cluster size or animal assumed to be the same for all detections, so that eqn 1 sim- ^ ^ behaviour), the observer (e.g. observer ID) or the environment plifies to N ¼ n P. For clustered populations, the CDS (e.g. habitat or weather), and can be either continuous covari- engine uses a simplification of eqn 3. The default method for ates or qualitative factors. estimating mean cluster size is the regression method of Buckland et al. (2001:73–75) in which log cluster size is regressed on estimated probability of detection. This is THEMRDSENGINE designed to remove any effect of ‘size bias’, which occurs The MRDS engine is an R package for use primarily with dou- when larger clusters are easier to detect than small ones at ble-platform line transect data, where the assumption of cer- large distances, so the simple mean of observed cluster sizes tain detection on the line can be relaxed (Laake & Borchers is a positively biased estimate of population mean cluster 2004). Double-platform methods are widely used in both aerial size. It also corrects for bias that arises when cluster size and shipboard surveys of marine mammals (e.g. Borchers et al. tends to be underestimated at large distances, so mean 2006), but are potentially useful in many situations where observed cluster size is a negatively biased estimate of popu- objects at zero distance are difficult to detect. Users wishing lation mean cluster size. to runthisengine needtohaveRinstalledinadditionto The MCDS and MRDS engines allow the detection func- Distance. As with the MCDS engine, covariates can be incor- tion (but not coverage probability) to vary, so that eqn 1 porated into the detection function model; however, inclusion applies when detections are of single animals. Equation 2 is of adjustment functions is not supported at present. Every used for clustered populations. attempt should be made to include all covariates that have a large effect on detectability, because unlike CDS and MCDS, ESTIMATING PRECISION estimation is not robust to the effects of unmodelled heteroge- neity at zero distance when detection on the line is not certain. For most analyses, the default method for estimating precision Single platform surveys can also be analysed using the is an analytical one. However, a nonparametric bootstrap is MRDS engine, but this is only really useful when calling the available. The default option for the bootstrap is to resample engine from R (where the CDS and MCDS engines are not lines or points. In some circumstances, the user may wish to readily available). resample strata, for example in point transect sampling, where 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 10 L. Thomas et al. a grid of points is placed at each of a number of random loca- plished using Distance. The overall density would then be esti- tions (called ‘cluster sampling’), where the grid is the appropri- mated as the sum of the stratum-specific estimates. ate unit to resample. This is achieved by defining each grid of A current limitation of Distance is that it can only handle points as a stratum, and resampling strata. The user can also one level of stratification. opt to resample individual detections, although this is not rec- ommended. Multi-level bootstrapping is also allowed but is ANALYSIS HINTS not recommended as resampling by line or point gives a better representation of the variability induced by the sampling pro- There are typically three phases in analysing data in Distance: cess (Davison & Hinkley 1997:100–102). exploratory data analysis, followed by model selection, and For the CDS engine, the analytical variance of a density then final analysis and inference. We focus here on CDS analy- or abundance estimate is estimated by the delta method ses; suggestions for MCDS analyses were given by Marques (Buckland et al. 2001:52), and comprises three components, et al. (2007). corresponding to estimation of encounter rate, the detection function and mean cluster size in the population (for clus- Exploratory data analysis tered populations). For details of how the three components are obtained and combined, see Buckland et al. (2001:76– Initially, exploratory data analysis is carried out to aid under- 79). However, the formulae for estimating encounter rate standing of the data and identify any problems. This phase variance given by Buckland et al. (2001:78–79) are not the should be started while the data are being collected, as this default option in Distance version 6, following work by Few- allows any problems with data collection to be identified and ster et al. (2009) showing an alternative estimator gives more rectified. If exact distances are recorded (rather than grouped robust estimates of variance when there are strong spatial or interval distance data), it is useful to plot histograms of the trends through the survey region. By default, the estimators distances with many cutpoints. assume lines or points were laid down at random. This leads In Fig. 2, we show examples of problematic line transect to overestimates of variance where systematic designs are data sets. Figure 2a shows an example of ‘spiked’ data. For used. For systematic parallel designs, estimators based on such data, different models will give very different estimates of post-stratification (Fewster et al. 2009) are available, and density, so it is important to understand what has caused the these produce more reliable (and usually lower) estimates for spike, and to modify field procedures accordingly. A common that design. cause in shipboard surveys is inaccurate estimation of sighting For the MCDS and MRDS engines, detection probability angles for detections ahead of the vessel. With inadequate is allowed to depend on covariates other than distance, and a training and ⁄ or aids, observers often record most detections different, more integrated approach to variance estimation is within perhaps 10 of the line as 0, leading to rounding of required. For the MCDS engine, see Marques & Buckland many perpendicular distances to zero. (2003, 2004:38–43) and Marques et al. (2007); for the MRDS Spiked data might also arise if animals are attracted towards engine, see Borchers et al. (2006). the observer. It is important that detections are made before any responsive movement occurs. Spiked data may arise even when there has been no failure STRATIFICATION (INCLUDING POST-STRATIFICATION) of an assumption. For example, in surveys of breeding Geographical stratification can be used to improve precision of songbirds, singing males may be much more detectable than estimates by subdividing the study region into blocks that are (non-singing and cryptic) females. In that case, the spike likely to be similar in animal density. Stratification can also be arises because females are only detectable close to the line. used when there is management interest in estimating density The simplest solution in this case is to additionally record in sub-sections of the study region. The overall estimate of den- whether the bird was singing. An analysis can then be sity is obtained as the mean of the stratum-specific estimates, conducted for singing birds, allowing estimation of the weightedbytherespectiveareasofthestrata. number of territories. If females are certain to be detected If the same study area is surveyed repeatedly, then survey- when on the line, then a separate analysis of females could be level strata could be defined. If a study area is surveyed by say conducted, if sample size is adequate, or sex could be included twoships, ananalysiswithships as strata canbeperformed. as a covariate in an MCDS analysis. Similar issues apply for In this latter case, the overall estimate of density would be point transect surveys. the mean of stratum-specific density estimates, weighted by Figure 2b gives clear evidence that at least one assumption the effort carried out by each ship. has failed. Aerial survey data can look like this, because it may In some cases, strata can be defined using criteria not avail- be difficult for observers to see the line, so that animals close to able during survey design. For example, it may be of scientific the line are missed. Solutions include aircraft with bubble win- interest to produce sex-specific estimates of density in the study dows, allowing the line to be seen, or offsetting the line, with area if the animals can be identified by gender. However, if the markers on window and wing strut, which, when aligned, allow genders mix freely within the study area, the survey cannot be the observer to record accurately which side of the line an ani- designed to account for sex-specific estimation. This type of mal ison. Animalsclosertothe path of theaircraftthanthe stratification is called post-stratification, and can be accom- line are not included in the analysis. 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Distance software for density estimation 11 (a) (b) 0 10 20 30 40 0 10 20 30 40 Distance from line (m) Distance from line (m) (c) (d) 0 5 10 15 20 0 1020 3040 Distance from line (m) Distance from line (m) Fig. 2. Examples of problematic line transect data sets: (a) spike at zero, (b) too few detections near zero, (c) rounding to favoured distances, (d) overdispersed data. Another possible cause of this pattern of observed distances Information Criterion (AIC) and goodness-of-fit statistics are is animal movement away from the line before detection. In invalidated by the failure of independence. It may be better to this case, attempts should be made to detect animals sooner, analyse clusters for model selection, then having selected a e.g. by searching ahead instead of to the side in aerial surveys, model, fit it to data for individuals for estimating abundance. or by searching with binoculars in shipboard surveys. For The same problem arises when analysing cue count data, as surveys of terrestrial mammals, nocturnal surveys using a multiple cues from the same animal may all be at similar dis- thermal imager can be effective. tances, especially when cue counting is conducted from points, Figure 2c shows considerable variability in the frequency rather than along lines (Buckland 2006). counts. In this case, high frequency counts correspond to inter- In Fig. 3a, we show a quantile–quantile (q–q) plot corre- vals containing distances that are a multiple of 10. This is sponding to the fit of a half-normal detection function model caused by rounding of estimated distances. Better observer to distances from the line for a line transect survey. If the model training, together with aids to estimation (e.g. laser rangefind- is good, we expect to see approximately a straight line. This ers for terrestrial surveys or reticles for shipboard surveys), can plot shows no systematic curvature, but has ‘steps’ – a clear usually minimize this problem. Given sufficient data, rounding indication that distances have been rounded. When distances does not usually compromise estimation unless there is exces- are analysed as exact (as distinct from grouped), Distance gen- sive rounding to distance zero (see above). However, judicious erates q–q plots; these can be useful for diagnosing problems choice of cutpoints is needed for testing v goodness-of-fit, so with the data (as here) or poor model fit (next section). that most rounded distances remain in their correct distance interval. Model selection Figure 2d is similar to Fig. 2c, except that the large frequen- cies do not occur at any obvious values to which distances The second phase of analysis is model selection. Included in might be rounded. Data like these indicate over-dispersion, this phase is selection of a suitable truncation distance w for and may occur for example if animals occur in clusters, but are the distance data. We truncate because otherwise extra adjust- recorded as individuals. This can occur when it is not easy to ment terms may be needed to fit a long tail to the detection locate the centre of a cluster of animals (a common problem function. This reduces precision for little gain, as data a long with primates), or to detect all animals in a cluster; in such cir- way from the line or point contribute little to the abundance cumstances, a recommended field protocol is to record each estimate (Buckland et al. 2001:103–108, 151–153). We typi- detected animal separately. This violates the independence cally truncate around 5% of distances for line transect sam- assumption, but estimation is remarkably robust to even gross pling, and more for point transect sampling (for which a higher violations of this assumption. However, model selection is proportion of detections corresponds to the tail of the detec- more problematic, because the usual tools such as Akaike tion function, Buckland et al. 2001:151). If grouped distance 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 Number of detections Number of detections 0 2 4 6 8 10 12 0 10 20 30 40 Number of detections Number of detections 0 5 10 15 20 25 0 10 20 30 40 12 L. Thomas et al. Having fitted several models, visual assessment of model fit (a) can be performed by examining histograms. For example, the hazard-rate model can fit implausible shapes for some data sets, especially for spiked data and for some point transect data sets. There may therefore be reasons to reject that model even if it fits the data well, for example because the estimated proba- bility of detection falls off more quickly with distance than is consistent with how the observer searches. For those models that give a reasonable fit, compare the goodness-of-fit mea- sures. Distance provides v goodness-of-fit tests. If exact dis- 0·0 0·2 0·4 0·6 0·8 1·0 tances are recorded, it also gives test statistics for the Empirical distribution function Kolmogorov–Smirnov and Crame´ r-von Mises tests and a q–q (b) plot (Buckland et al. 2004:385–389). Figure 3b is an example of where the model (the half-normal in this case) provides a poor fit to the data, as can be seen by the departure from a straight line. These data have too few observations close to the line relative to mid-distances to be well modelled by a half-nor- mal; a model with a flatter ‘shoulder’ to the detection function is needed. The AIC provides a relative measure of fit. The model with the smallest AIC provides, in some sense, the best fit to the data. AIC values are only comparable if they are applied to 0·0 0·2 0·4 0·6 0·8 1·0 Empirical distribution function exactlythe same data–inDistance, this meansthatrunsmade using the same survey and data filter are comparable. For such Fig. 3. Quantile–quantile (q–q) plots corresponding to fits of a half- sets, Distance provides the DAIC values, which are AIC values normal model to line transect data. (a) The model fit seems satisfac- with the AIC of the best-fitting model subtracted. Thus tory, although there is clear evidence of rounding in the observations. (b) These data show evidence of too few detections close to the line, DAIC = 0 for the best model. Other model selection criteria relative to what would be expected under the half-normal model. are also available. data are collected, choice of w is restricted to the cutpoints Final analysis and inference defining the intervals. Having selected w, cutpoints should be set for the distance The third phase of analysis is to select the best model, and data. If data are recorded in intervals, the cutpoints will be extract summary analyses and plots for reporting. If choice of predetermined. If data are recorded as ‘exact’, but in fact are model is uncertain and influential, an analysis in which more subject to substantial rounding, there may be merit in assigning than one model is selected can be run, and the option to esti- the distances to intervals for analysis, where cutpoints are mate the variance by bootstrap selected. For each bootstrap defined well away from favoured rounding distances, so that resample, the best model will be selected (using AIC by few observations will be recorded in the wrong interval. This is default), so that different models may be selected for the analy- achieved by setting cutpoints in the data filter of Distance. sis of different resamples. Resulting variances and confidence More usually, we will wish to analyse the data as exact (even if intervals then reflect model uncertainty. An example is given there is rounding, provided it is not severe), but set cutpoints by Williams & Thomas (2009). for presenting histograms and conducting v goodness-of-fit tests. This is achieved by setting cutpoints in the diagnostics More advanced analysis options section of the detection function model definition. When selecting a suitable model, it is worth bearing in mind MULTIPLIERS that it is only an approximation to the true detection function. There is little point in throwing every possible model at the Multipliers provide a simple means of extending standard dis- data – this risks over-fitting. If the data are of high quality, tance sampling methods. They may be added in Distance via many possible model and adjustment combinations will give the project set-up wizard, or later in the multipliers section of very similar estimates. In our experience, the following combi- the model definition. nations often perform well and there is rarely any need to try Indirect surveys of animal sign are often conducted, e.g. others: uniform key with cosine adjustments; half-normal key dung surveysofdeer orelephants,ornestsurveys of apes.Sign with cosine adjustments; half-normal key with Hermite poly- densityisconverted to animal densitybydividingbyanesti- nomial adjustments; hazard-rate key with simple polynomial mate of the sign production rate per animal, and an estimate of adjustments. We would never recommend using the negative themeantimetodecay of thesign. Theseestimates canbe exponential key, which is present in Distance largely for histor- added as multipliers, with the divide operator option (hence ical reasons. they are actually ‘dividers’), together with estimates of their 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14 Fitted cumulative distribution function Fitted cumulative distribution function 0·0 0·2 0·4 0·6 0·8 1·0 0·0 0·2 0·4 0·6 0·8 1·0 Distance software for density estimation 13 standard errors. If the degrees of freedom associated with the model are attached to each segment. Once a density surface estimated standard error are known, they may also be added. model has been built, density or abundance can be estimated Cues are instantaneous, or at least very short-lived, signs, over any area of interest within the study area by predicting such as a whale blow or a songburst. Point transect methods over a grid of points to which the same covariates are attached. may be used to estimate the number of cues per unit area per To build this grid, Distance requires that the global data layer unit time, and this may be converted to estimated animal den- be associated with a shapefile. sity by entering a divider equal to the estimated number of cues per unit of time per animal. For whale cue count surveys ACCESSING ANALYSIS ENGINES FROM OTHER (Buckland et al. 2001:191–197), only a sector of the full circle SOFTWARE is surveyed; the fraction of the circle surveyed may be entered as an additional divider, but in this case as it is a known con- Sometimes analyses are required that are too complex to stant no standard error would be entered. carry out within Distance. In this case, the graphical user For trapping and lure point transect sampling (Buckland interface of Distance can be circumvented. For the CDS and et al. 2006), the detection function is estimated by setting up MCDS engines, data and descriptions of the models are trials with animals at known locations. We record whether or passed to the FORTRAN program via a data file and a not each trial results in detection of the animal, and use logistic command file. Results of an analysis are placed into a statis- regression to estimate the detection function (in general, with tics (‘stats’) file, which can be read by software written by a probability of detection at the point allowed to be less than researcher to extract useful parameter estimates for further unity). This allows the effective area covered around each point analysis. Bootstrapping, for example, can be accomplished to be estimated, and counts of animals from the main survey by resampling the data, and rewriting the data file presented can be converted to estimated animal density by dividing by to MCDS. This process is somewhat streamlined for the effective area, by setting up the appropriate multiplier in researchers familiar with R, using the MRDS engine. With Distance. Similarly, if too few detections are made in a distance this approach, data are read into R only once, and the re- sampling survey to allow reliable estimation of the detection sampling and accumulation of parameter estimates are all function, but an estimate is available from another survey that conducted within R without the use of intermediate text files. is considered appropriate, counts can be converted to estimates Likewise, the DSM engine can be accessed directly from of animal density in the same way. within R. The command languages of all four engines are documented in an appendix to the Users’ Guide (CDS and MCDS) and R help files (MRDS and DSM). THE DSM ENGINE If transects are not positioned according to a random design, Future plans design-based extrapolation of densities to the wider region may be unreliable. Even if a randomized design is used, we Theoretical developments in distance sampling continue to may wish to model animal density as a function of spatially occur, and we endeavour to incorporate these into Distance. indexed environmental covariates – so called ‘spatial model- The most recent enhancements include the DSM engine and ling’ or ‘habitat modelling’. This is also useful for estimating the improved estimator of encounter rate variance of Fewster abundance in small regions of the study area, for which there et al. (2009). In future, we hope to incorporate a simulation is inadequate sampling effort to produce a stand-alone esti- engine into Distance, so practitioners can more readily exam- mate. ine the behaviour of distance sampling estimators for their par- The DSM analysis engine implements the ‘count method’ of ticular situation. Other enhancements we hope to make Hedley & Buckland (2004), in which the segment counts (seg- include: advances in estimating the effects of treatments (in the ments having been defined outside Distance) are modelled as a sense of designed experiments) that are relevant to many function of covariates such as habitat type, altitude or bottom impact assessment studies (Buckland et al. 2009); assessment depth, distance from human access, land-use type, latitude and of time trends in abundance or density from repeated surveys longitude. This is commonly done using generalized additive (Thomas, Burnham & Buckland 2004); and unequal coverage models (GAM) (Wood 2006) with overdispersed Poisson error estimators (Rexstad 2007). structure and a log link, with effective area of the segment There are some challenges associated with modelling density (defined as actual area multiplied by the estimated proportion surfaces, including variance estimation associated with the two of animals counted in the segment) serving as an offset. Other stages of the modelling process, autocorrelation in the counts, modelling strategies for DSM are also available in Distance. potential for unreasonable extrapolation of the density surface, The counts within each segment can be converted to estimates and ‘bleeding’ of abundance estimates to areas spatially proxi- of abundance within each segment, and the area of the segment mate but separated by adverse topography. Subsequent ver- (out to truncation distance w) is the offset. Alternatively, esti- sions of Distance may incorporate the refinements developed mated density can be used as the response variable, no offset, by Wood, Bravington & Hedley (2008), which makes substan- and the area of the segment used as a weight. tial progress in tackling the latter two issues. To use this engine, Distance requires that transect lines are The field of distance sampling is dynamic and growing. Con- divided into segments and that covariates to be included in the sequently, we anticipate that the software will also continue to 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47,5–14 14 L. Thomas et al. Hedley, S.L. & Buckland, S.T. (2004) Spatial models for line transect sampling. evolve, to address more complex ecological applications and Journal of Agricultural, Biological and Environmental Statistics, 9, 181–199. make use of further statistical developments. Hiby, A.R. (1985) An approach to estimating population densities of great whales from sighting surveys. IMA Journal of Mathematics Applied in Medi- cine and Biology, 2, 201–220. Acknowledgements Johnson, D., Laake, J. & VerHoef, J. (2009) A model-based approach for mak- ing ecological inference from distance sampling data. Biometrics,DOI: We are grateful to the organizations that have funded the development of Dis- 10.1111/j.1541-0420.2009.01265.x. tance, an up-to-date list of whom can be found on the software web pages. The Laake, J.L. & Borchers, D.L. (2004) Methods for incomplete detection at dis- creation and maintenance of the software is a large, collaborative project; in tance zero. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, addition to the authors of this paper, contributions have been made by David K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 108–189. Anderson, David Borchers, Louise Burt, Julian Derry, Rachel Fewster, Fer- Oxford University Press, Oxford. nanda Marques, David Miller and John Pollard. We thank Stuart Newson, an Laake, J.L., Burnham, K.P. & Anderson, D.R. (1979) User’s Manual for Pro- anonymous reviewer and E.J. Milner-Gulland for their helpful comments on gram TRANSECT. Utah State University Press, Logan, UT. an earlier draft. Laake, J.L., Buckland, S.T., Anderson, D.R. & Burnham, K.P. (1993) DIS- TANCE User’s Guide V2.0. Colorado Cooperative Fish and Wildlife Research Unit, Colorado State University, Fort Collins, CO, 72 pp. Lukacs, P.M., Franklin, A.B. & Anderson, D.R. (2004) Passive approaches to References detection in distance sampling. Advanced Distance Sampling (eds S.T. Buck- Alldredge, M.W., Simons, T.R. & Pollock, K.H. (2007) A field evaluation of land, D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers & L. Tho- distance measurement error in auditory avian point count surveys. Journal mas), pp. 260–280. Oxford University Press, Oxford. of Wildlife Management, 71, 2759–2766. Marques, F.F.C. & Buckland, S.T. (2003) Incorporating covariates into stan- Borchers, D.L. & Burnham, K.P. (2004) General formulation for distance sam- dard line transect analyses. Biometrics, 59, 924–935. pling. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, Marques, F.F.C. & Buckland, S.T. (2004) Covariate models for the detection K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 6–30. Oxford function. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, University Press, Oxford. K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 31–47. Oxford Borchers, D.L., Laake, J.L., Southwell, C. & Paxton, C.G.M. (2006) Accom- University Press, Oxford. modating unmodeled heterogeneity in double-observer distance sampling Marques, F.F.C., Buckland, S.T., Goffin, D., Dixon, C.E., Borchers, D.L., surveys. Biometrics, 62, 372–378. Mayle, B.A. & Peace, A.J. (2001) Estimating deer abundance from line tran- Buckland, S.T. (1992) Fitting density functions using polynomials. Applied Sta- sect surveys of dung: sika deer in southern Scotland. Journal of Applied Ecol- tistics, 41,63. ogy, 38, 349–363. Buckland, S.T. (2006) Point transect surveys for songbirds: robust methodolo- Marques, T.A., Thomas, L., Fancy, S.G. & Buckland, S.T. (2007) Improving gies. The Auk, 123, 345–357. estimates of bird density using multiple covariate distance sampling. The Buckland, S.T., Anderson, D.R., Burnham, K.P. & Laake, J.L. (1993) Distance Auk, 127, 1229–1243. Sampling: Estimating Abundance of Biological Populations. Chapman & Microsoft Corporation (2000) Visual Basic 6. Microsoft Corporation, Red- Hall, London. mond, Washington, USA. Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. R Development Core Team (2009) R: A Language and Environment for Statisti- & Thomas, L. (2001) Introduction to Distance Sampling. Oxford University cal Computing. R Foundation for Statistical Computing, Vienna. ISBN Press, Oxford. 3-900051-07-0. http://www.R-project.org. Buckland, S.T., Anderson, D.R., Burnham K.P., Laake, J.L., Borchers, D.L. Rexstad, E. (2007) Non-Uniform Coverage Estimators for Distance Sampling. & Thomas L (eds) (2004) Advanced Distance Sampling. Oxford University Technical Report 2007-1. Centre for Research into Ecological and Environ- Press, Oxford. mental Modelling, St.AndrewsUniversity. http://hdl.handle.net/10023/628/. Buckland, S.T., Summers, R.W., Borchers, D.L. & Thomas, L. (2006) Point Strindberg, S. & Buckland, S.T. (2004) Zigzag survey designs in line transect transect sampling with traps or lures. Journal of Applied Ecology, 43,377– sampling. Journal of Agricultural, Biological and Environmental Statistics, 9, 443–461. Buckland, S.T., Russell, R.E., Dickson, B.G., Saab, V.A., Gorman, D.G. & Strindberg, S., Buckland, S.T. & Thomas, L. (2004) Design of distance sam- Block, W.M. 2009. Analysing designed experiments in distance sampling. pling surveys and Geographic Information Systems. Advanced Distance Journal of Agricultural, Biological and Environmental Statistics,DOI: Sampling (eds S.T. Buckland, D.R. Anderson, K.P. Burnham, J.L. Laake, 10.1198/jabes.2009.08030. D.L. Borchers & L. Thomas), pp. 190–228. Oxford University Press, Burnham, K.P., Anderson, D.R. & Laake, J.L. (1980) Estimation of density Oxford. from line transect sampling of biological populations. Ecological Mono- Thomas, L., Burnham, K.P. & Buckland, S.T. (2004) Temporal inferences from graphs, 72, 1–202. distance sampling surveys. Advanced Distance Sampling (eds S.T. Buckland, Compaq Computer Corporation (2001) Compaq Visual Fortran. Version 6.6. D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), Compaq Computer Corporation, Houston, Texas, USA. pp. 71–107. Oxford University Press, Oxford. Davison, A.C. & Hinkley, D.V. (1997) Bootstrap Methods and Their Applica- Thomas, L., Williams, R. & Sandilands, D. (2007) Designing line transect sur- tion. Cambridge University Press, Cambridge, UK. veys for complex survey regions. Journal of Cetacean Research and Manage- ESRI, Inc. (2004) MapObjects 2.3. Environmental Systems Research, Institute ment, 9, 1–13. Inc., Redlands, CA, USA. Williams, R. & Thomas, L. (2009) Cost-effective abundance estimation of rare Fewster, R.M. & Buckland, S.T. (2004) Assessment of distance sampling esti- marine animals: small-boat surveys for killer whales in British Columbia, mators. Advanced Distance Sampling (eds S.T. Buckland, D.R. Anderson, Canada. Biological Conservation, 142, 1542–1547. K.P. Burnham, J.L. Laake, D.L. Borchers & L. Thomas), pp. 281–306. Wood, S. N. (2006) Generalized Additive Models: An Introduction with R.Chap- Oxford University Press, Oxford. man & Hall, Boca Raton, FL. Fewster, R.M., Southwell, C., Borchers, D.L., Buckland, S.T. & Pople, A.R. Wood, S.N., Bravington, M.V. & Hedley, S.L. (2008) Soap film smoothing. (2008) The influence of animal mobility on the assumption of uniform dis- Journal of the Royal Statistical Society B, 70, 931–955. tances in aerial line transect surveys. Wildlife Research, 35, 275–288. Fewster, R.M., Buckland, S.T., Burnham, K.P., Borchers, D.L., Jupp, P.E., Received 29 July 2009; accepted 21 October 2009 Laake, J.L. & Thomas, L. (2009) Estimating the encounter rate variance in Handling Editor: E.J. Milner-Gulland distance sampling. Biometrics, 65, 225–236. 2009 The Authors. Journal compilation  2009 British Ecological Society, Journal of Applied Ecology, 47, 5–14

Journal

The Journal of Applied EcologyPubmed Central

Published: Nov 17, 2009

There are no references for this article.