Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling

ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and... The ability to model a species’ geographic distribution, given occurrence records and environmental information, is based on the assumption that abiotic factors directly or indirectly control species distributions (Austin ). Species distribution modeling (SDM) has led to a surge in research on topics such as species’ potential invasiveness (Thuiller et al. ), the impacts of climate change on species distributions (Thuiller , Hijmans and Graham , Morin and Thuiller ), the relative importance of various predictors in determining species range boundaries (Glor and Warren ), historical reconstructions of species distributions (Svenning et al. ), conservation applications such as the identification of suitable habitats for undiscovered populations or reintroductions (Martínez‐Meyer et al. ), analysis of broad‐scale patterns of species richness (Pineda and Lobo ), and spatially‐explicit demographic simulations (Chan and Brown , He et al. ). The ability to conduct such analyses at increasingly broad taxonomic and spatial scales has largely been facilitated by successful efforts to digitize museum specimen records, georeference associated localities (Guralnick et al. , Ellwood et al. ) and provide this information in a standardized format through easily accessible data portals (Constable et al. , Wieczorek et al. ). While progress has been made in these efforts to make high quality occurrence records widely available (e.g. Global Biodiversity Information Facility, < www.gbif.org>), additional progress is still needed in providing and exploring the utility of different environmental datasets for modeling geographic distributions. In particular, it is unknown if currently available and widely used environmental datasets are sufficient and optimal for modeling distributions of terrestrial species.The generation and projection of species distribution models requires data layers of environmental information that provide discriminatory power regarding presence and absence of species. As we typically do not know the true distribution of a species, it can be challenging to determine when an appropriate set of environmental variables has been chosen. Ideally, these variables should have direct relevance to ecological or physiological processes determining species distributions, but for many species this information is not generally available (Alvarado‐Serrano and Knowles ). Correlative niche modeling approaches that rely on statistical associations between species occurrences and environmental variables are frequently used (Peterson et al. , Alvarado‐Serrano and Knowles ), in which the environmental determinants of habitat suitability are not known a priori. The 19 bioclimatic variables from WorldClim (Hijmans et al. ) are perhaps the most broadly employed set of environmental data layers for this purpose, on account of their high resolution, global coverage, and availability for both historical and future climate scenarios. However, the biological suitability of these bioclimatic variables and other such environmental datasets for modeling the distribution of the species in question is often not thoroughly assessed.In the absence of specific knowledge about the environmental variables most likely to determine species distributions, it may be tempting to construct models using a large number of predictor variables, but such models run the risk of poor performance. For example, models built with several highly collinear variables are at an increased risk of overfitting and overparameterization (Dormann et al. , Wright et al. ), and may behave unexpectedly when projected to other time periods or geographic regions where they may encounter combinations of variables that have no analog in model training (Dormann et al. , Owens et al. , Warren et al. ). Additionally, whether large sets of environmental variables or smaller subsets of environmental data are used can greatly impact model predictions (Rödder et al. , Synes and Osborne , Braunisch et al. ). Variable reduction approaches can reduce model overfitting and improve model transferability (Warren et al. , Wright et al. ), yet the relative merits of various approaches are poorly characterized and continue to be explored (Araújo and Guisan , Braunisch et al. ). In general, variables may be reduced either statistically, or by selecting variables from ecological theory that are likely to be important given the physiology of the organism in question (Kearney et al. , Doswald et al. , Rödder et al. , Synes and Osborne ).Given the recognized importance of variable selection in constructing ecological niche models (Synes and Osborne , Braunisch et al. ), increasing the availability of easily accessible datasets of environmental variables that may be ecologically and physiologically important to a variety of organisms should be a priority for improving flexibility and performance of SDM. Several environmental datasets are already available with which to perform SDM (e.g. WorldClim (Hijmans et al. ), PRISM (< www.prism.oregonstate.edu >; Daly et al. ), ClimateNA (Wang et al. , Hamann et al. , Wang et al. )), but not all of these datasets are transferable among time periods or geographic regions or easily integrated with other variables. Additional environmental data layers that conceptually complement and are formatted for easy use alongside the 19 bioclimatic variables from WorldClim (Hijmans et al. ) – one of the most widely used environmental datasets for SDM – would broaden the options available for selection of environmental variables (whether based on ecological theory or through statistical variable reduction) and may lead to improved model performance for some species. Despite the description in the literature of formulae for many such variables that could be computed for particular regions or time periods (see Synes and Osborne as an example), the use of such variables is limited to those researchers with the GIS skills necessary to generate these datasets and the desire to assemble them from several disparate sources.To help satisfy this need, we introduce the ENVIREM dataset (ENVIronmental Rasters for Ecological Modeling): specifically, we provide a set of biologically relevant climatic and topographic variables (all of which have previously been described in the literature) at multiple resolutions and time periods. The variables we include were selected in particular because we hypothesize they are likely to have direct relevance to ecological or physiological processes determining distributions of many species. They should therefore facilitate ecologically‐informed variable selection, and may also result in improved model performance using statistical variable‐thinning approaches. As these variables are intended to complement the existing WorldClim dataset (Hijmans et al. ), we provide the ENVIREM dataset at the same extents and resolutions as WorldClim, for the present, mid‐Holocene, and Last Glacial Maximum (LGM). We also provide an R package (R Core Team) that will enable users to generate these variables from primary sources for any resolution, geographic area, or time period, including for future time periods of interest (for which we have not provided static rasters due to the large number of climate change models in existence that are continually updated as climate‐change projections improve). Finally, through several case studies, we show that the ENVIREM variables can improve model performance and be valuable additions to the set of variables that are currently widely used in species distribution modeling.MethodsWe compiled a list of biologically relevant climatic variables (Table ) that could be derived from monthly temperature and precipitation data (WorldClim ver. 1.4, Hijmans et al. ) and monthly extraterrestrial solar radiation (available from < www.cgiar‐csi.org >). These variables are described by Thornthwaite (), Daget (), Hargreaves and Hargreaves (), Willmott and Feddema (), Vörösmarty et al. (), Zomer et al. (, ), Rivas‐Martínez and Rivas‐Sáenz (), Sayre et al. () and Metzger et al. (). We additionally produced two elevation‐derived topographic variables, terrain roughness index (Wilson et al. ) and topographic wetness index (Boehner et al. , Conrad et al. ), generated from a global 30 arc‐second elevation and bathymetry digital elevation model (Becker et al. ). All variables were produced at the same resolutions as the bioclimatic variables that are currently available through WorldClim: 30 arc‐seconds, and 2.5, 5 and 10 arc‐minutes. Topographic variables were produced at a 30 arc‐second resolution, and subsequently coarsened to match the lower resolutions, rather than constructed directly from lower‐resolution elevation data. As such, the topographic variables of large grid cells at coarser scales represent the average fine‐scale (i.e. 30 arc‐second) values within each grid cell. Calculating the topographic variables in this manner was particularly important to avoid loss of information regarding terrain roughness index when scaling up to coarser resolutions. For the two climate variables related to growing degree‐days (GDD), we note that GDD are accumulated on a daily basis, whereas our estimates are approximations based on mean monthly temperature (Table ).Summary of the variables in the ENVIREM dataset. Citations for variable sources are as follows: A: Zomer et al. (, ); B: Hargreaves and Hargreaves (); C: Thornthwaite (); D: Willmott and Feddema (); E: Vörösmarty et al. (); F: Sayre et al. (); G: Rivas‐Martínez and Rivas‐Sáenz (); H: Daget (); I: Metzger et al. (); J: Wilson et al. (); K: Boehner et al. (); L: Conrad et al. ()Variable abbreviationBrief descriptionUnitsSourceannualPETannual potential evapotranspiration: a measure of the ability of the atmosphere to remove water through evapotranspiration processes, given unlimited moisturemm yr–1A, BaridityIndexThornthwaiteThornthwaite aridity index: index of the degree of water deficit below water need–CclimaticMoistureIndexa metric of relative wetness and aridity–D, Econtinentalityaverage temp. of warmest month – average temp. of coldest month°CF, GembergerQEmbergers pluviothermic quotient: a metric that was designed to differentiate among Mediterranean type climates–HgrowingDegDays0sum of mean monthly temperature for months with mean temperature greater than 0°C multiplied by number of days–IgrowingDegDays5sum of mean monthly temperature for months with mean temperature greater than 5°C multiplied by number of days–ImaxTempColdestMonthmax. temp. of the coldest month°C × 10IminTempWarmestMonthmin. temp. of the coldest month°C × 10ImonthCountByTemp10count of the number of months with mean temp greater than 10°CmonthsIPETColdestQuartermean monthly PET of coldest quartermm month–1IPETDriestQuartermean monthly PET of driest quartermm month–1IPETseasonalitymonthly variability in potential evapotranspirationmm month–1IPETWarmestQuartermean monthly PET of warmest quartermm month–1IPETWettestQuartermean monthly PET of wettest quartermm month–1IthermIndcompensated thermicity index: sum of mean annual temp., min. temp. of coldest month, max. temp. of the coldest month, × 10, with compensations for better comparability across the globe°CF, Gtriterrain roughness index–JtopoWetSAGA‐GIS topographic wetness index–K, LWe generated rasters for all variables at multiple spatial resolutions for current climatic conditions, the mid‐Holocene (approximately 6000 yr ago) and the Last Glacial Maximum (LGM, approximately 22 000 yr ago). For the paleoclimate datasets, we generated variables from three global general circulation models (GCMs): the Community Climate System Model ver. 4 (CCSM4, Collins et al. ), the Model for Interdisciplinary Research On Climate (MIROC‐ESM, Hasumi and Emori ), and the model of the Max Planck Inst. for Meteorology (MPI‐ESM‐P, Stevens et al. ). Fine‐scale monthly rasters for these paleoclimate scenarios were generated from coarse‐resolution GCM output using the delta downscaling method (Ramirez‐Villegas and Jarvis , and < www.worldclim.org/downscaling >) and are available with the WorldClim dataset. As the formulae for some ENVIREM variables require mean monthly temperature, which is available from the WorldClim dataset in the present but not for other time periods, we calculated mean monthly temperature in all time periods as the mean of the maximum and minimum temperatures. In the present, this calculation is highly correlated with the available mean monthly temperatures (Pearson correlation coefficient > 0.99). All raster manipulation and variable creation was carried out in R with the raster package 2.5‐2 (Hijmans et al. ).Additional variables derived from and complementing the 19 bioclimatic variables from WorldClim (Hijmans et al. ) will only be of value in SDM applications if they represent information not currently contained in the 19 bioclimatic variables. To assess the degree of novelty of these new variables, we calculated the Pearson correlation coefficient between each of the ENVIREM variables and the 19 bioclimatic variables from WorldClim, at a global scale (10 arc‐minute resolution), and also by biogeographic realm (Olson et al. , Table , Supplementary material Appendix 1 Table A2), for both the present and the past (CCSM4 global circulation model). Similarly, we also calculated correlation coefficients between terrain roughness index and topographic wetness index with elevation (Table ) to explore whether these variables contain topographic information not captured by elevation alone.Pearson correlation coefficients between each of the climatic ENVIREM variables and the WorldClim bioclimatic variable with which the ENVIREM variable is most strongly correlated (Supplementary material Appendix 1 Table A1), globally and in separate biogeographic realms. For each variable and realm, the bottom‐left triangle contains the correlation coefficient in the present, and the top‐right triangle contains the correlation coefficient in the LGM (CCSM4) for the same bioclimatic variable. Grey shading indicates that the absolute value of the correlation is ≤ 0.85Pearson correlation coefficients between ENVIREM topographic variables and elevation, at a global scale as well as in different biogeographic realmsNeotropicalPalearcticNearcticIndo‐malayAfrotropicOceaniaAustralasiaGlobalTerrain roughness0.650.580.480.830.410.190.650.46Topographic wetness–0.59–0.45–0.42–0.67–0.37–0.49–0.53–0.39Case studiesTo investigate how the inclusion of the ENVIREM variables could affect the performance and predictions of species distribution models, we generated species distribution models with Maxent ver. 3.3.3k (Phillips et al. ) for 20 North American terrestrial vertebrate species, using the curated occurrence dataset from Waltari et al. (). Specifically, we generated niche models using three different sets of initial environmental predictor variables. Firstly, we generated models using only the 19 bioclimatic variables from WorldClim (referred to hereafter as the bioclim model). Secondly, we built models using the 19 bioclimatic variables plus 14 of the climatic ENVIREM variables (hereafter referred to as the bioclim + envirem‐clim model). Finally, we generated niche models with the 19 bioclimatic variables and 16 ENVIREM variables, including 14 climatic variables and the two topographic variables (the bioclim + envirem‐all model). Note that none of the models, including bioclim + envirem‐all, included elevation as a predictor variable. We chose not to include two variables, aridityIndexThornthwaite as it was conceptually redundant with the climaticMoistureIndex, and monthCountByTemp10 because it is a categorical variable that would not have been amenable to the variable selection procedure that we applied. Finally, we did not generate any models using only the ENVIREM variables without the 19 bioclimatic WorldClim variables, as the ENVIREM variables are intended to supplement, not replace, the bioclimatic variables. All distribution modeling was performed in the dismo package ver. 1.0‐15 in R (Hijmans ) from rasters at a 2.5 arc‐minute resolution. This resolution is likely a reasonable match to the locational accuracy of the species occurrences, as these come primarily from museum collections, and is the resolution used for SDM in the original study (Waltari et al. ).To construct each model, we first spatially thinned the occurrence records, retaining only occurrences that were greater than ten kilometers in proximity to one another, using the spThin package in R (Aiello‐Lammens et al. ). For each species individually, we defined the model‐training region by adding a 1000 km buffer around all occurrence records (Supplementary material Appendix 1 Fig. A1). All occurrence data and rasters were transformed and projected to the North America Albers Equal Area Conic projection, as it has been shown that a failure to account for changing grid‐cell area across latitudes can negatively impact SDM results (Budic et al. ). We statistically thinned variables to include in each model for each species using the ‘corSelect’ function in the fuzzySim package ver. 1.6.3 in R (Barbosa ) where each pair of variables that is correlated above a set threshold is tested against the response variable (species presence and absence) with a bivariate model. The variable with a better fit as measured with AIC is selected while the other is dropped, and the procedure is repeated until all pairwise correlations are below the threshold. We applied a correlation threshold of 0.75, and generated pseudo‐absences from 10 000 randomly sampled points throughout the training region (excluding grid cells with known occurrence records) because there were no true absence records in our data.For each species, we measured SDM performance for the bioclim, the bioclim + envirem‐clim and the bioclim + envirem‐all models (with reduced sets of variables via statistical thinning as described above, Table ) using three threshold‐independent evaluation metrics: AUCTEST, AUCDIFF, and the size‐corrected Akaike information criterion (AICc). AUCTEST is a metric that measures the discriminatory ability of the species distribution model at test localities withheld during model construction, and thus represents the ability of the model to predict species presence (Peterson et al. ). AUCDIFF is the difference between the AUC calculated from training localities and AUCTEST, and is a measure of model overfitting, with higher values of AUCDIFF representing more overfit models (Warren and Seifert ). AICc is an information theoretic metric that balances model fit against degrees of freedom from parameterization (i.e. model complexity), such that lower values of AICc correspond to models with better goodness‐of‐fit accounting for model complexity (Burnham and Anderson , Warren and Seifert ). For AUC metrics, we partitioned calibration and evaluation data via the masked geographically‐structured partitioning scheme described by Radosavljevic and Anderson (), implemented in the R package ENMeval ver. 0.2.1 (Muscarella et al. ), which leads to more realistic and less biased estimates of SDM performance than the more traditionally used random k‐fold partitioning scheme. This partitioning scheme divides occurrence records into four geographic regions with an equal number of occurrence records, and calculates AUC metrics as the average of those metrics calculated individually using each of the four possible partitions of geographic regions into one region of evaluation data and three regions of calibration data. AICc was calculated from the full, non‐partitioned models.ENVIREM and WorldClim variables included in the bioclim, bioclim + envirem‐clim and bioclim + envirem‐all models, for four case study species. Variables included in each model were selected using a statistical variable selection approach (see Methods section for additional details)Spotted salamander | bioclimSpotted salamander | bioclim + envirem‐climSpotted salamander | bioclim + envirem‐allBlue grouse | bioclimBlue grouse | bioclim + envirem‐climBlue grouse | bioclim + envirem‐allCalifornia gnatcatcher | bioclimCalifornia gnatcatcher | bioclim + envirem‐climCalifornia gnatcatcher | bioclim + envirem‐allMountain chickadee | bioclimMountain chickadee | bioclim + envirem‐climMountain chickadee | bioclim + envirem‐allAnnual mean temp [bio1]Mean diurnal temp range [bio2]+++++++++Isothermality [bio3]+Temp seasonality [bio4]+Max temp warmest month [bio5]Min temp coldest month [bio6]Temp annual range [bio7]++++++Mean temp of wettest quarter [bio8]++++++Mean temp of driest quarter [bio9]+++Mean temp of warmest quarter [bio10]++++Mean temp of coldest quarter [bio11]Annual precip [bio12]+Precip of wettest month [bio13]Precip of driest month [bio14]++++++Precip seasonality [bio15]+++++++++Precip of wettest quarter [bio16]+++Precip of driest quarter [bio17]+Precip of warmest quarter [bio18]+++++++++Precip of coldest quarter [bio19]+++++++annualPETclimaticMoistureIndex++++++continentality++embergerQgrowingDegDays0growingDegDays5maxTempColdestminTempWarmest++++PETColdestQuarterPETDriestQuarter++PETseasonality++++++PETWarmestQuarter++PETWettestQuarter++++++thermicityIndextopoRoughnesstopoWetness++++The complexity of SDMs built with Maxent can be adjusted with the regularization multiplier, increased values of which lead to less parameterized models, as well as with the inclusion of additional feature classes (i.e. transformations of the original predictor variables) that allow for increasingly complex models. We evaluated distribution models across different sets of permissible feature classes, and for each of these, across a range of regularization multiplier values. The evaluation metrics described above were used to determine optimal feature class and model complexity for each model individually (Muscarella et al. ).After selecting optimal feature class and model complexity for each model, we also compared performance of the optimal models across each of the three variable sets (i.e. bioclim, bioclim + envirem‐clim, and bioclim + envirem‐all) using the same evaluation metrics. The AUC metrics describe absolute performance of the models (ranging from 0 to 1). AICc, however, describes relative performance of candidate models. For this metric, we define a model as having substantial support over another if it has a difference in AICc greater than or equal to four, as models with AICc values more similar than this are generally considered to have equivalent support (Burnham and Anderson ). Although we present results for all evaluation metrics, we ultimately favor AICc for selecting the optimal model and variable set for each species, as the focus of our case studies is on model comparison, and AICc has been shown to perform better than AUC metrics according to a range of criteria, including the selection of optimal levels of model complexity, model transferability in space and time, and the relative ranking of variable importance (Warren and Seifert , Warren et al. , Moreno‐Amat et al. ).The impact of using different environmental variables in niche modeling may not be apparent if two sets of variables lead to similar projected distributions in the present. However, if the degree of correlation between two different sets of variables differs in the past compared to in the present, then variable choice might have a greater effect on SDM projections to other time periods. To explore this possibility, we calculated niche similarity in the present and in the LGM using Schoener's D (Schoener , Warren et al. ), a metric that quantifies the degree of niche overlap in geographic space. Values of D range from 0 (completely different niches across geographic space) to 1 (identical niches over geographic space). Overlap was quantified with the fuzzySim package in R (Barbosa ). For each case‐study species we focused the niche overlap calculation on the geographic regions of the model projections where comparisons among models are most meaningful, rather than across broad regions of the continent where all models predict low habitat suitability and are thus very similar. In particular, we calculated niche overlap statistics only over the geographic region predicted to contain suitable habitat in at least one of the models. To define this region, we first reduced the geographic extents of interest for both the projected bioclim and bioclim + envirem‐clim models individually using a habitat suitability threshold that preserved 95% of the training presences. We further excluded areas outside the model training region, except for a few species where the majority of the predicted LGM distribution lay outside the training region. Finally, we combined these regions for both the bioclim and bioclim + envirem‐clim models and calculated niche overlap from (non‐thresholded) model projections within this combined region.We did not project the bioclim + envirem‐all model to the LGM, because topographic variables are difficult to interpret for the LGM in glaciated regions of North America. These regions have experienced substantial changes in topography since the LGM due to glacial erosion (Bell and Laine ). However, we note that models using topographic variables could be projected to the LGM in particular regions of interest where topographic variables can be assumed to have remained static since the LGM (e.g. unglaciated regions of California, Bemmels et al. ).Data depositionThe ENVIREM dataset has been deposited through the Univ. of Michigan Deep Blue Data repository < http://dx.doi.org/doi:10.7302/Z2BR8Q40> (Title and Bemmels ), and can be accessed through the project website at < www.envirem.github.io >. The ‘envirem’ R package is available on CRAN.ResultsThe ENVIREM dataset comprises variables that were generated for three time periods (present, mid‐Holocene and the LGM), using several different general circulation models (CCSM4, MIROC‐ESM, MPI‐ESM‐P) at multiple resolutions, so as to facilitate integration with rasters from WorldClim (Hijmans et al. ). All rasters are available for download at envirem.github.io. To enable users to generate these variables from other circulation models or time periods, we have provided all code in an R package ‘envirem’, available from CRAN.At a global scale, most new climatic variables were highly correlated with at least one of the 19 bioclimatic variables from WorldClim (Table ). The aridity‐related variables (i.e. climatic moisture index and Thornthwaite's aridity index) and some of the PET‐related variables were the least redundant at the global scale. However, many of the new variables were less highly correlated with the 19 bioclimatic variables within specific biogeographic realms. Oceania and the Afrotropics were the realms with the greatest number of new variables with lower maximum correlation coefficients (≤ 0.85), indicating that niche models of species from those regions may benefit most from the inclusion of these new variables. More often than not, correlations were lower during the mid‐Holocene and LGM than in the present (Supplementary material Appendix 1 Table A2, Table ), which indicates that even if specific sets of variables are redundant in the present, they may not necessarily be redundant in other time periods and variable choice could have greater impacts on model projections to other time periods. All new climatic variables had a maximum correlation of ≤ 0.85 in at least one biogeographic realm during at least one time period, with the exception of continentality, thermicity index, maximum temperature of the coldest month and minimum temperature of the warmest month. Some new variables were consistently most highly correlated with the same bioclimatic variable from WorldClim across regions, while other new variables were most highly correlated with different bioclimatic variables across different regions (Supplementary material Appendix 1 Table A1).In terms of topographic variables derived from elevation, terrain roughness index was not highly correlated with elevation globally or in any biogeographic region (Table ). Topographic wetness index was also not highly correlated with elevation (Table ), even though higher values of topographic wetness are conceptually associated with lower elevations at a local scale (i.e. within a given watershed; Boehner et al. ).Case studiesStatistical thinning of the sets of variables prior to ecological niche modeling substantially reduced the number of variables, with three to 11 variables retained in each model (Table , Supplementary material Appendix 1 Table A3). For all species, at least one ENVIREM variable was retained in the bioclim + envirem‐clim models. For the bioclim + envirem‐all models, at least one topographic variable was retained for 19 of 20 species. For most species, one or more bioclimatic variables that were retained in the bioclim model were dropped from the bioclim + envirem‐clim and bioclim + envirem‐all models and were replaced by one or more of the ENVIREM variables, indicating that these variables are more strongly predictive of the presence and absence of the species than the dropped bioclim variables (Supplementary material Appendix 1 Table A3). The impact of including ENVIREM variables on model performance varied among species, but models containing ENVIREM variables performed substantially better (according to the AICc metric) than the bioclim model in 13 of 20 species.In Fig. , we highlight results for four species that show particularly distinct improvement with the ENVIREM variables: the spotted salamander Ambystoma maculatum, the blue grouse Dendragapus obscurus, the California gnatcatcher Polioptila californica and the mountain chickadee Poecile gambeli. In these four species, inclusion of ENVIREM variables led to improvements in all metrics of model performance, although differences in AICc values were more substantial than differences in AUC metrics for these species. Across the 16 other case study species (Supplementary material Appendix 1 Fig. A2), an improvement in performance when including ENVIREM variables was found for ten species according to greater AUCTEST values (Arborimus longicaudus, Chamaea fasciata, Desmognathus wrighti, Dicamptodon tenebrosus, Elaphe obsoleta, Glaucomys sabrinus, Glaucomys volans, Lampropeltis zonata, Martes americana and Myodes gapperi). However, substantial improvements in model performance (improvement by more than four AICc units) were found for all but seven species according to AICc values, with no substantial difference for Dicamptodon tenebrosus, Elaphe obsoleta and Lepus arcticus, and a substantial decrease in performance for four species (Crotalus atrox, Dicrostonyx groenlandicus, Glaucomys volans and Myodes gapperi). Inclusion of ENVIREM topographic variables specifically led to especially notable improvements in AICc scores for Poecile gambeli (Fig. ), Eumeces fasciatus, Blarina brevicauda and Plethodon idahoensis (Supplementary material Appendix 1 Fig. A2).Ecological niche model performance with and without the ENVIREM variables for four selected case study species. Each line represents the set of feature classes that led to the best performance according to either AUCTEST (top and middle panels) or AICc (bottom panel), with performance evaluated across a range of regularization multiplier values (Supplementary material Appendix 1 Table A4). AUCDIFF is a measure of model overfitting for the model selected by maximizing AUCTEST. In the AUC plots, the dotted line represents the value for the best‐performing model. In the AICc plots, the grey shading represents a ΔAICc of 4 from the best (lowest) AICc score. Performance of models within the grey polygon is not considered to be substantially different (Burnham and Anderson ).The optimal Maxent parameters identified by the model evaluation metrics were typically not concordant across the bioclim, bioclim + envirem‐clim, and bioclim + envirem‐all models (Supplementary material Appendix 1 Table A4). Similarly, as the different metrics evaluate the niche models using conceptually different criteria, AUC‐based evaluations did not identify the same Maxent parameters as AICc‐based evaluations (Supplementary material Appendix 1 Table A4). As the focus of our case studies is on the choice of variables employed, an in‐depth examination of the differences between AUC and AICc‐based optimization of Maxent is beyond the scope of our study. We therefore focus the rest of our results and discussion on comparing predictions of models that were optimized based on AICc (see Methods).Projections of the AICc‐optimized species distribution models constructed with and without the ENVIREM variables generally did not differ greatly at continental scales for the current time period, but regional‐scale differences in habitat suitability were observed. For the four case‐study species showing greatest improvement in all evaluation metrics, the overall suitable ranges are very similar, though not identical, at the continental scale (Fig. ). In finer‐scale maps focusing on a particular region of interest, however, there are more substantial differences in suitability across the landscape at a regional scale (Fig. ). For example, suitability of the California Central Valley for Polioptila californica is much higher in the bioclim model than in the bioclim + envirem‐clim model. Similarly, regions of the California coast and northwestern Great Basin for Dendragapus obscurus are also considerably different across models, as well as large areas of the interior range of Poecile gambeli. Niche overlap (Schoener's D) between the two models averaged 0.9 for these four species and 0.91 across all modelled species (Supplementary material Appendix 1 Fig. A3, Table A5).Predicted habitat suitability during the current time period for four case study species, from Maxent models optimized in terms of feature class and regularization parameter according to the AICc metric, for models constructed with and without the ENVIREM variables. Suitability scores range from 0 (blue) to 1 (red). The central, continental‐scale maps show habitat suitability within the training region only (see text for explanation), with predicted habitat suitability below a 95% training presence threshold considered to be unsuitable (grey). The outer maps show detail from the region within the box on the continental maps, selected to highlight local‐scale differences between the models. Occurrence records are shown as black points. Schoener's D niche overlap is calculated between the bioclim and the bioclim + envirem‐clim models, exclusively within the thresholded training regions (Supplementary material Appendix 1 Fig. A1; see the Methods section for additional details).Differences between the predictions of the AICc‐optimized bioclim and bioclim + envirem‐clim models become more pronounced when projected to the LGM (Fig. , Supplementary material Appendix 1 Table A5). In particular, Schoener's D niche overlap scores are much lower in the LGM (mean = 0.71, 0.71 and 0.72 for GCM CCSM4, MPI‐ESM‐P and MIROC‐ESM, respectively) compared to the present, and for many species there are considerable differences between models in predicted distribution in the LGM (Fig. ). For Ambystoma maculatum, habitat suitability in the bioclim model was highest on exposed continental shelf off the coast of North Carolina, whereas in the bioclim + envirem‐clim model the highest habitat suitability was in the Lower Mississippi River Valley. For Dendragapus obscurus, connectivity between regions was greater in the bioclim + envirem‐clim model, and areas of high habitat suitability included the Columbia Plateau and northern Cascades. Both models for this species also showed marginally to moderately suitable habitat in western Canada and Alaska, although this may be an overprediction as at least part of this region was covered by the Cordilleran ice sheet during the LGM (Dyke et al. ). For Polioptila californica, the bioclim model predicted large regions of California to be suitable, including California's Central Valley, whereas in the bioclim + envirem‐clim model, higher suitability was primarily restricted to Baja California and coastal regions of southern California. For Poecile gambeli, visual differences between model projections were even greater, with high habitat suitability in the Rocky Mountains in the bioclim + envirem‐clim model only, and much higher habitat suitability throughout most of the species’ range overall, and the Great Basin in particular.Predicted habitat suitability during the Last Glacial Maximum for four case study species, for models constructed with and without the ENVIREM variables. Suitability scores range from 0 (blue) to 1 (red). Optimization of model parameters and thresholding are as in Fig. . Schoener's D niche overlap is calculated between the bioclim and the bioclim +envirem‐clim models, exclusively within the thresholded training regions (Supplementary material Appendix 1 Fig. A1; see the Methods section for additional details). Habitat suitability is shown within the training region only, with predicted habitat suitability below a 95% training presence threshold considered to be unsuitable (grey).DiscussionWe have generated 18 climatic and topographic variables that will be valuable in a broad array of applications for species distribution modelling, and have made these variables easily available and complementary to an existing widely‐used environmental dataset. Although they are largely derived from the same underlying dataset as the bioclimatic variables from WorldClim, we have demonstrated that including the ENVIREM variables in SDM can lead to notable improvements in performance and differences in projections of species distribution models. Inclusion of these new variables led to substantial improvement in SDM performance (AICc metric) in 13 out of 20 species, and substantially worse performance in only four species. Although inclusion of the ENVIREM variables did not always lead to significantly improved performance, the fact that they were beneficial to many species indicates that they are generally worth consideration when constructing species distribution models. The species‐specific nature of our results also highlight the importance of following best practices for variable selection and parameter optimization, as we have done here. The importance of particular variables in SDM will be a function of the species under study, its distribution in geographic and climatic space, the time period and geographic region of interest, and the ultimate question being addressed. Nonetheless, the links to ecological and physiological processes represented in many of the ENVIREM variables mean that they will likely be particularly useful for a wide variety of applications.Potential applicationsAs we have showcased here, the ENVIREM dataset will be of immediate value in SDM applications and will potentially lead to the generation of better species distribution models. If variable selection is done via statistical approaches, then inclusion of these variables will allow researchers to start with a larger pool of biologically relevant options, thereby increasing the odds that variables that are highly informative regarding the presence and absence of a species will be discovered. If the goal is to select variables a priori based on the ecology and natural history of the organism, then the ENVIREM variables will provide valuable options, as they are likely to be ecologically relevant to certain species and may have specific ties to biological processes for many species. SDM has been employed as a tool in a large variety of studies, and the inclusion of new variables has the potential to impact their conclusions. Identifying better sets of predictor variables for certain species could, among other things, potentially alter projections of species’ invasiveness for particular regions (Peterson and Nakazawa ), alter our understanding of potentially suitable habitat for species introductions (Martínez‐Meyer et al. ), lead to identification of new areas of high habitat suitability for conservation interest, affect predictions of shifts in habitat suitability in response to future climate change (Thuiller , Hijmans and Graham , Morin and Thuiller ), lead to new phylogeographic hypotheses about where species may have been distributed in the past (Chan and Brown , He et al. , Bemmels et al. ), and impact our understanding of the evolution of climatic tolerances across related species (Title and Burns , Kozak and Wiens ).With these additional variables, ecologists and evolutionary biologists will also be able to craft more specific hypotheses that are informed by the ecology of the organisms under study. For example, in an integrative distributional, demographic and coalescent (iDDC) framework (Knowles and Alvarado‐Serrano , Brown and Knowles , He et al. ), these variables will allow for the specification of competing hypotheses pertaining to the relative importance of different climatic and topographic variables in constraining the distribution of species over time (Bemmels et al. ), giving researchers greater flexibility than currently exists in modeling spatial and genetic patterns over time. Another example would be the inclusion of these additional variables in the spatial mapping of the velocity of climate change, which can tell us how organisms must move to track their current climatic conditions (Hamann et al. ).To our knowledge, this is the only existing multi‐variable dataset that is truly complementary to WorldClim in its geographic breadth, application and accessibility. The Climond dataset (Kriticos et al. ) provides an extended suite of bioclimatic variables only at 10 and 30 arc‐minutes for current and future climate scenarios, while the Ecoclimate dataset (Lima‐Ribeiro et al. ) provides only the standard 19 bioclimatic variables for multiple past, present and future time periods at 30 arc‐minutes. Other variables potentially useful for biodiversity modeling have been released, such as habitat heterogeneity (Tuanmu and Jetz ), global cloud cover (Wilson and Jetz ) and region‐specific variables, such as ClimateNA (Wang et al. , Hamann et al. , Wang et al. ), but these variables are either not transferrable to other time periods, not available globally or not available at finer spatial resolutions. In contrast, the ENVIREM dataset includes additional variables (some of which overlap with the Climond dataset) at all of the resolutions currently available from WorldClim, for past and current time periods. If researchers wish to perform SDM using occurrences that have high spatial precision in areas where region‐specific datasets for all desired time periods are available, then alternatives to the ENVIREM dataset may prove most useful (e.g. ClimateNA; Wang et al. ). However, such a situation is likely to represent only a small minority of SDM applications, making the ENVIREM dataset more generally applicable. In addition, the envirem R package makes it possible to generate these variables for other time periods, or from alternative input datasets (for example PRISM; Daly et al. ), allowing users to easily customize their use of these variables.Biological relevance of ENVIREM variablesAlthough the potential applications of these variables to SDM are vast, one unique benefit of the ENVIREM variables is their potential for improving our ability to construct niche models informed by ecological knowledge and natural history. Biologically informed niche models may be constructed for species for which the conceptual relationships between particular variables and biological processes relevant to determining a species’ distribution are known a priori (Kearney et al. , Doswald et al. , Rödder et al. , Synes and Osborne ), or may be constructed with the intention of exploring and testing different hypotheses about these relationships (Bemmels et al. ).The potential mechanisms by which the ENVIREM variables may determine distributions are numerous and will be specific to the species of interest. In general, subsets of the ENVIREM variables may directly control species distributions, or (more commonly) may impact other processes that in turn determine distributions (Austin ). The particular variables included in the ENVIREM dataset were selected because of their clear conceptual links to particular ecological processes and indices. For example, growing degree‐days are predictive of plant phenology and growth rate (McMaster and Wilhelm ), processes which impact species range limits (Morin et al. ) and drive local adaptation (Howe et al. ). Evapotranspiration not only describes climate generally, but is also physiologically linked to plant growth potential due to its impact on gas exchange with the atmosphere and temperature regulation (Thornthwaite , Katul et al. ). The more complex climatic indices included in the ENVIREM variables (e.g. thermicity, aridity, moisture, Emberger's pluviothermic quotient) may characterize environmental conditions that are more directly physiologically relevant to given species than simple descriptors of climate such as temperature or precipitation alone (Daget ). Finally, the topographic ENVIREM variables could conceivably be important predictors of habitat types associated with local‐ to regional‐scale relief that may be key predictors of species distributions at these spatial scales (Lassueur et al. , Austin and Van Niel ). We have provided just a few examples of potential links to biological factors that could determine species distributions, but the ecological relevance of any of the ENVIREM variables is likely to be species‐specific and different species’ distributions may be associated with environmental variables because of different mechanisms. Nonetheless, it is this type of conceptual relevance and these potential links to physiological and ecological processes that will make the ENVIREM variables particularly useful for many SDM applications.Incorporating ENVIREM variables into SDM best practicesIdeally, the choice of variables for niche modeling should be informed by knowledge of the natural history and ecology of the organism under study, as this approach has been shown to produce more realistic niche models (Rödder et al. , Saupe et al. ). However, it is most often the case that such information is not readily known (Alvarado‐Serrano and Knowles ). How one should go about choosing bioclimatic variables is still an open question, the impact of which can be considerable (Peterson and Nakazawa , Synes and Osborne , Braunisch et al. ). It is generally not considered best practice to include all bioclimatic variables, as they exhibit a high degree of collinearity. This collinearity tends to lead to overly complex, overfit models (Rodda et al. ). Additionally, the nature of the correlation between bioclimatic variables may differ across time periods, potentially leading to unexpected behavior in SDM projections (Rodda et al. , Synes and Osborne , Dormann et al. , Warren et al. ). While we expect that many researchers will find the ENVIREM variables extremely useful for a variety of applications, we recommend that the merits of including all or some of the ENVIREM variables should be carefully considered relative to the specific application, and that variable thinning, model optimization, and other best practices in ecological niche modeling should be followed (Merow et al. , Alvarado‐Serrano and Knowles ). For example, as we do not have in‐depth ecological information about the species whose ecological niches were modeled in our case studies, we employed a statistical approach to variable thinning in order to reduce the number of correlated variables, while retaining the variables with the greatest explanatory power.An important finding of our case studies was that the difference between the bioclim and bioclim + envirem‐clim models, as measured with Schoener's D, was small in the present, but greater in the LGM. Choice of predictor variables has previously been shown to have large impacts on model projections to other time periods or geographic regions (Peterson and Nakazawa , Synes and Osborne , Braunisch et al. ). The impact of variable selection points both to the utility of additional variables for developing and testing hypotheses about shifts in species distributions across different time periods and in novel spatial contexts, but also to the need for caution when making modeling decisions. Ideally, models could be evaluated in past time periods with independent fossil occurrences (Davis et al. , Gavin et al. , Moreno‐Amat et al. ), but their availability will depend on the taxon under study.In addition to the question of which environmental variables to use, a growing number of studies have demonstrated that species‐specific tuning of virtually all steps in the niche modeling pipeline can lead to improved results, and that Maxent's default behavior is often not sufficient to achieve optimal performance (Anderson and Gonzalez , Warren and Seifert , Merow et al. , Radosavljevic and Anderson , Moreno‐Amat et al. ). Although we could have held all aspects save the predictor variables constant in the generation of niche models in order to be able to compare the results directly, generating models in this way is considered poor practice. Instead, we chose to independently generate the best possible models, given current best practices. We found that Maxent's default parameters were rarely optimal (Supplementary material Appendix 1 Table A4), which echoes the findings of others that parameter tuning is an important step toward generating less overfit and more transferable species distribution models (Anderson and Gonzalez , Warren and Seifert , Merow et al. , Radosavljevic and Anderson , Moreno‐Amat et al. ). Different evaluation metrics most often did not lead to the selection of the same optimized parameters (Supplementary material Appendix 1 Table A4). This is expected, as AICc is intended to minimize the number of necessary parameters, while AUC metrics are not. Regardless of the environmental variables selected for SDM, the optimization of model parameters should always be considered, as model parameters can have a large impact on model performance and predictions (Fig. , Supplementary material Appendix 1 Fig. A2).Utility of topographic variables in SDMIn addition to climatic variables, we also generated two topographic indices: topographic roughness and topographic wetness. These variables offer novel information as they are not redundant with elevation (Table ), an environmental variable which is already broadly available for SDM. The use of elevation in SDM has been controversial (Hof et al. ), and may be particularly problematic when projecting to other time periods or geographic contexts where relationships between elevation and the climatic factors determining a species’ niche may be different than the relationships in the context in which the model was built. However, the topographic roughness and topographic wetness indices are less likely to suffer from this complication because they are less causally linked than elevation to regional‐scale climate, and they contain topographic information that may be useful for determining species distributions independent of climate. In particular, topographic roughness index may be a reasonable surrogate for habitat heterogeneity and microsite availability that could be relevant to determining geographic distributions of some species, and topographic wetness index may help distinguish between areas that experience similar regional climate but differ markedly in microhabitat due to relative drainage position within a watershed.However, it is important to consider whether topographic variables are available at an appropriate geographic scale for predicting species distributions. Variation in topographic features associated with microhabitats may occur at a much finer scale than that at which topographic variables are assessed, which could reduce their utility for SDM (Lassueur et al. , Austin and Van Niel , Pradervand et al. ). Since all topographic ENVIREM variables at all resolutions are ultimately averaged from values calculated from the finest‐scale (30 arc‐second) elevational model (see Methods), we have minimized concerns about the potential mismatch between the scale at which the indices were generated and at which topography is relevant to a species. However, it is still important to consider whether variation in topographic roughness and wetness at the 30 arc‐second scale (approximately 926 m at the equator) is likely to be meaningful for the species in question for the particular geographic region of interest and intended modeling application.Nonetheless, our case studies revealed that including topographic variables led to distinct improvement in SDM performance for several species, in some cases significantly exceeding the improvement gained by adding only the climatic ENVIREM variables (Fig. , Supplementary material Appendix 1 Fig. A2). These results once again emphasize the species‐specific nature of the degree of utility of any new variable. Topographic variables are likely to be particularly useful for exploring competing hypotheses regarding whether local‐ to regional‐scale factors such as microsite availability are important in determining species’ distributions (Bemmels et al. ).Beyond general considerations about whether or not topographic variables are important for modeling a species’ distribution, care should also be taken in assessing whether or not static variables (i.e. variables that do not change over time) are appropriate to use for a given SDM application. The topographic variables we derive can be assumed to be largely static through time (especially in unglaciated regions, with the exception of changes in coastline reflecting sea‐level changes). Stanton et al. () explored the inclusion of static variables in SDM and found that including such variables when projecting to future climate‐change scenarios typically improved, and rarely hindered, SDM performance when the variables were known to influence species distributions. Nonetheless, we recommend particular caution when projecting to contexts where topography may have changed substantially over the timescale of interest, for example due to Pleistocene glacial erosion in North America (Bell and Laine ).ConclusionsThe ENVIREM variables constitute a valuable dataset for species distribution modeling for a variety of applications. Although they are complementary to and largely derived from the WorldClim database that is already widely in use, they contain novel information not captured by this database. In particular, the ENVIREM variables include conceptually novel climatic variables that may more closely reflect specific ecological and physiological processes, as well as topographic variables distinct from elevation that may represent non‐climatic local‐ to regional‐scale aspects of a species’ niche. In our exploration of case studies for 20 North American vertebrate species, the impact of including the ENVIREM variables was species‐specific: in 13 out of 20 cases model performance substantially improved compared to a model using only WorldClim variables, particularly when topographic ENVIREM variables were included; in seven cases model performance was not substantially different or declined. In general, models built with and without the ENVIREM variables produced habitat suitability predictions differing only modestly and at local scales in the current time period, but sometimes resulted in dramatic regional‐scale differences in predicted habitat suitability when projected to a different time period. Overall, our results highlight how the ENVIREM variables often improve model performance, even when biological information about the variables that are most relevant to determining habitat suitability for a given species is not known a priori. Furthermore, when knowledge about the determinants of species distributions is available from ecological theory, the ENVIREM variables may be particularly useful for developing and testing the predictions of species‐specific hypotheses. The significant improvements in model performance we observed for many species when following best practices in species distribution modeling suggest that the ENVIREM variables are worth general consideration for SDM, as their main benefit is providing a more comprehensive set of environmental variables to choose from, whether through statistical variable thinning or variable selection informed by ecological knowledge.Acknowledgements – We would like to thank L. L. Knowles for her guidance in developing this project. This manuscript greatly benefited from comments from G. Costa.Funding – Provided for graduate student support by an NSF GRFP fellowship (JBB) and a Univ. of Michigan Dept of Ecology and Evolutionary Biology Edwin H. Edwards Scholarship in Biology (JBB).ReferencesAiello‐Lammens, M. E. et al. 2015. spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. – Ecography 38: 1–5.Alvarado‐Serrano, D. F. and Knowles, L. L. 2014. Ecological niche models in phylogeographic studies: applications, advances and precautions. – Mol. Ecol. Resour. 14: 233–248.Anderson, R. P. and Gonzalez, I. Jr 2011. Species‐specific tuning increases robustness to sampling bias in models of species distributions: an implementation with Maxent. – Ecol. Model. 222: 2796–2811.Araújo, M. B. and Guisan, A. 2006. Five (or so) challenges for species distribution modelling. – J. Biogeogr. 33: 1677–1688.Austin, M. P. 2002. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. – Ecol. Model. 157: 101–118.Austin, M. P. and Van Niel, K. P. 2011. Improving species distribution models for climate change studies: variable selection and scale. – J. Biogeogr. 38: 1–8.Barbosa, A. M. 2015. fuzzySim: applying fuzzy logic to binary similarity indices in ecology. – Methods Ecol. Evol. 6: 853–858.Becker, J. J. et al. 2009. Global bathymetry and elevation data at 30 arc seconds resolution: SRTM30_PLUS. – Mar. Geodesy 32: 355–371.Bell, M. and Laine, E. P. 1985. Erosion of the Laurentide region of North America by glacial and glaciofluvial processes. – Quat. Res. 23: 154–174.Bemmels, J. B. et al. 2016. Tests of species‐specific models reveal the importance of drought in postglacial range shifts of a Mediterranean‐climate tree: insights from integrative distributional, demographic and coalescent modelling and ABC model selection. – Mol. Ecol. 25: 4889–4906.Boehner, J. et al. 2002. Soil regionalization by means of terrain analysis and process parameterization. – In: Micheli, E. et al. (eds), Soil classification 2001. European Soil Bureau, Research Report no. 7, Luxembourg, pp. 213–222.Braunisch, V. et al. 2013. Selecting from correlated climate variables: a major source of uncertainty for predicting species distributions under climate change. – Ecography 36: 971–983.Brown, J. L. and Knowles, L. L. 2012. Spatially explicit models of dynamic histories: examination of the genetic consequences of Pleistocene glaciation and recent climate change on the American pika. – Mol. Ecol. 21: 3757–3775.Budic, L. et al. 2015. Squares of different sizes: effect of geographical projection on model parameter estimates in species distribution modeling. – Ecol. Evol. 6: 202–211.Burnham, K. P. and Anderson, D. R. 2004. Multimodel inference: understanding AIC and BIC in model selection. – Soc. Method Res. 33: 261–304.Chan, L. M. and Brown, J. L. 2011. Integrating statistical genetic and geospatial methods brings new power to phylogeography. – Mol. Phylogenet. Evol. 59: 523–537.Collins, W. D. et al. 2006. The community climate system model version 3 (CCSM3). – J. Clim. 19: 2122–2143.Conrad, O. et al. 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4. – Geosci. Model Develop. 8: 1991–2007.Constable, H. et al. 2010. VertNet: a new model for biodiversity data sharing. – PLoS Biol. 8: e1000309.Daget, P. 1977. Le bioclimat méditerranéen: analyse des formes climatiques par le système d'Emberger. – Vegetatio 34: 87–103.Daly, C. et al. 2002. A knowledge‐based approach to the statistical mapping of climate. – Clim. Res. 22: 99–113.Davis, E. B. et al. 2014. Ecological niche models of mammalian glacial refugia show consistent bias. – Ecography 37: 1133–1138.Dormann, C. F. et al. 2012. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. – Ecography 36: 27–46.Doswald, N. et al. 2009. Potential impacts of climatic change on the breeding and non‐breeding ranges and migration distance of European Sylvia warblers. – J. Biogeogr. 36: 1194–1208.Dyke, A. S. et al. 2002. The Laurentide and Innuitian ice sheets during the last glacial maximum. – Quat. Sci. Rev. 21: 9–31.Ellwood, E. R. et al. 2015. Accelerating the digitization of biodiversity research specimens through online public participation. – BioScience 65: 383–396.Gavin, D. G. et al. 2014. Climate refugia: joint inference from fossil records, species distribution models and phylogeography. – New Phytol. 204: 37–54.Glor, R. E. and Warren, D. L. 2011. Testing ecological explanations for biogeographic boundaries. – Evolution 65: 673–683.Guralnick, R. P. et al. 2006. BioGeomancer: automated georeferencing to map the worldfs biodiversity data. – PLoS Biol. 4: e381.Hamann, A. et al. 2013. A comprehensive, high‐resolution database of historical and projected climate surfaces for western North America. – Bull. Am. Meteorol. Soc. 94: 1307–1309.Hamann, A. et al. 2015. Velocity of climate change algorithms for guiding conservation and management. – Global Change Biol. 21: 997–1004.Hargreaves, G. L. and Hargreaves, G. H. 1985. Irrigation water requirements for Senegal River basin. – J. Irrig. Drain Eng. 111: 265–275.Hasumi, H. and Emori, S. 2004. K‐1 coupled gcm (miroc) description. – Center for Climate System Research, Univ. of Tokyo, Tokyo.He, Q. et al. 2013. Integrative testing of how environments from the past to the present shape genetic structure across landscapes. – Evolution 67: 3386–3402.Hijmans, R. J. 2016. raster: geographic data analysis and modeling. – R package ver. 2.5‐8, < https://CRAN.R‐project.org/package = raster>.Hijmans, R. J. and Graham, C. H. 2006. The ability of climate envelope models to predict the effect of climate change on species distributions. – Global Change Biol. 12: 2272–2281.Hijmans, R. J. et al. 2005. Very high resolution interpolated climate surfaces for global land areas. – Int. J. Climatol. 25: 1965–1978.Hijmans, R. J. et al. 2016. dismo: species distribution modeling. – R package ver. 1.0‐15, < https://CRAN.R‐project.org/package = dismo>.Hof, A. R. et al. 2012. The usefulness of elevation as a predictor variable in species distribution modelling. – Ecol. Model. 246: 86–90.Howe, G. T. et al. 2003. From genotype to phenotype: unraveling the complexities of cold adaptation in forest trees. – Can. J. Bot. 81: 1247–1266.Katul, G. G. et al. 2012. Evapotranspiration: a process driving mass transport and energy exchange in the soil–plant–atmosphere–climate system. – Rev. Geophys. 50: 1–25.Kearney, M. et al. 2008. Modelling species distributions without using species distributions: the cane toad in Australia under current and future climates. – Ecography 31: 423–434.Knowles, L. L. and Alvarado‐Serrano, D. F. 2010. Exploring the population genetic consequences of the colonization process with spatio‐temporally explicit models: insights from coupled ecological, demographic and genetic models in montane grasshoppers. – Mol. Ecol. 19: 3727–3745.Kozak, K. H. and Wiens, J. J. 2016. What explains patterns of species richness? The relative importance of climatic‐niche evolution, morphological evolution, and ecological limits in salamanders. – Ecol. Evol. 6: 5940–5949.Kriticos, D. J. et al. 2011. CliMond: global high‐resolution historical and future scenario climate surfaces for bioclimatic modelling. – Methods Ecol. Evol. 3: 53–64.Lassueur, T. et al. 2006. Very high resolution digital elevation models: do they improve models of plant species distribution? – Ecol. Model. 198: 139–153.Lima‐Ribeiro, M. S. et al. 2015. Ecoclimate: a database of climate data from multiple models for past, present, and future for macroecologists and biogeographers. – Biodivers. Inform. 10: 1–21.Martínez‐Meyer, E. et al. 2006. Ecological niche modelling and prioritizing areas for species reintroductions. – Oryx 40: 411–418.McMaster, G. S. and Wilhelm, W. W. 1997. Growing degree‐days: one equation, two interpretations. – Agric. For. Meteorol. 87: 291–300.Merow, C. et al. 2013. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. – Ecography 36: 1058–1069.Metzger, M. J. et al. 2013. A high‐resolution bioclimate map of the world: a unifying framework for global biodiversity research and monitoring. – Global Ecol. Biogeogr. 22: 630–638.Moreno‐Amat, E. et al. 2015. Impact of model complexity on cross‐temporal transferability in Maxent species distribution models: an assessment using paleobotanical data. – Ecol. Model. 312: 308–317.Morin, X. and Thuiller, W. 2009. Comparing niche‐ and process‐based models to reduce prediction uncertainty in species range shifts under climate change. – Ecology 90: 1301–1313.Morin, X. et al. 2007. Process‐based modeling of species “distributions: what limits temperate tree species” range boundaries? – Ecology 88: 2280–2291.Muscarella, R. et al. 2014. ENMeval: an R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxentecological niche models. – Methods Ecol. Evol. 5: 1198–1205.Olson, D. M. et al. 2001. Terrestrial ecoregions of the world: a new map of life on earth. – BioScience 51: 933–938.Owens, H. L. et al. 2013. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. – Ecol. Model. 263: 10–18.Peterson, A. T. and Nakazawa, Y. 2008. Environmental data sets matter in ecological niche modelling: an example with Solenopsis invicta and Solenopsis richteri. – Global Ecol. Biogeogr. 17: 135–144.Peterson, A. T. et al. 2011. Ecological niches and geographic distributions. – Princeton Univ. Press.Phillips, S. J. et al. 2006. Maximum entropy modeling of species geographic distributions. – Ecol. Model. 190: 231–259.Pineda, E. and Lobo, J. M. 2009. Assessing the accuracy of species distribution models to predict amphibian species richness patterns. – J. Anim. Ecol. 78: 182–190.Pradervand, J. N. et al. 2014. Very high resolution environmental predictors in species distribution models: moving beyond topography? – Prog. Phys. Geogr. 38: 79–96.Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. – J. Biogeogr. 41: 629–643.Ramirez‐Villegas, J. and Jarvis, A. 2010. Downscaling global circulation model outputs: the Delta method decision and policy analysis working paper no. 1. – Cent. Trop. Agric. Colomb.Rivas‐Martínez, S. and Rivas‐Sáenz, S. 2009. Synoptical worldwide bioclimatic classification system. – < www.globalbioclimatics.org/ >, accessed 15 February 2016.Rodda, G. H. et al. 2011. Challenges in identifying sites climatically matched to the native ranges of animal invaders. – PLoS One 6: e14670.Rödder, D. et al. 2009. Alien invasive slider turtle in unpredicted habitat: a matter of niche shift or of predictors studied? – PLoS One 4: e7843.Saupe, E. E. et al. 2012. Variation in niche and distribution model performance: the need for a priori assessment of key causal factors. – Ecol. Model. 237–238: 11–22.Sayre, R. et al. 2009. A new map of standardized terrestrial ecosystems of the conterminous United States. – US Geological Survey Professional Paper 1768, Reston, VA.Schoener, T. W. 1968. The anolis lizards of Bimini: resource partitioning in a complex fauna. – Ecology 49: 704–726.Stanton, J. C. et al. 2012. Combining static and dynamic variables in species distribution models under climate change. – Methods Ecol. Evol. 3: 349–357.Stevens, B. et al. 2013. Atmospheric component of the MPI‐M Earth System Model: ECHAM6. – J. Adv. Model. Earth Syst. 5: 146–172.Svenning, J.‐C. et al. 2011. Applications of species distribution modeling to paleobiology. – Quat. Sci. Rev. 30: 2930–2947.Synes, N. W. and Osborne, P. E. 2011. Choice of predictor variables as a source of uncertainty in continental‐scale species distribution modelling under climate change. – Global Ecol. Biogeogr. 20: 904–914.Thornthwaite, C. W. 1948. An approach toward a rational classification of climate. – Geogr. Rev. 38: 55–94.Thuiller, W. 2004. Patterns and uncertainties of species’ range shifts under climate change. – Global Change Biol. 10: 2020–2027.Thuiller, W. et al. 2005. Niche-based modelling as a tool for predicting the risk of alien plant invasions at a global scale. – Global Change Biol. 11: 2234–2250.Title, P. O. and Burns, K. J. 2015. Rates of climatic niche evolution are correlated with species richness in a large and ecologically diverse radiation of songbirds. – Ecol. Lett. 18: 433–440.Title, P. O. and Bemmels, J. B. 2017. Data from: ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. – Deep Blue Data repository < http://dx.doi.org/doi:10.7302/Z2BR8Q40>.Tuanmu, M.‐N. and Jetz, W. 2015. A global, remote sensing‐based characterization of terrestrial habitat heterogeneity for biodiversity and ecosystem modelling. – Global Ecol. Biogeogr. 24: 1329–1339.Vörösmarty, C. J. et al. 2005. Geospatial indicators of emerging water stress: an application to Africa. – Ambio 34: 230–236.Waltari, E. et al. 2007. Locating Pleistocene refugia: comparing phylogeographic and ecological niche model predictions. – PLoS One 2: e563.Wang, T. et al. 2012. ClimateWNA – high‐resolution spatial climate data for western North America. – J. Appl. Meteorol. Climatol. 51: 16–29.Wang, T. et al. 2016. Locally downscaled and spatially customizable climate data for historical and future periods for North America. – PLoS One 11: e0156720.Warren, D. L. and Seifert, S. 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. – Ecol. Appl. 21: 335–342.Warren, D. L. et al. 2008. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. – Evolution 62: 2868–2883.Warren, D. L. et al. 2014. Incorporating model complexity and spatial sampling bias into ecological niche models of climate change risks faced by 90 California vertebrate species of concern. – Divers. Distrib. 20: 334–343.Wieczorek, J. et al. 2012. Darwin core: an evolving community‐developed biodiversity data standard. – PLoS One 7: e29715.Willmott, C. and Feddema, J. 1992. A more rational climatic moisture index. – Prof. Geogr. 44: 84–88.Wilson, A. M. and Jetz, W. 2016. Remotely sensed high‐resolution global cloud dynamics for predicting ecosystem and biodiversity distributions. – PLoS Biol. 14: e1002415–20.Wilson, M. F. J. et al. 2007. Multiscale terrain analysis of multibeam bathymetry data for habitat mapping on the continental slope. – Mar. Geodesy 30: 3–35.Wright, A. N. et al. 2014. Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change. – Divers. Distrib. 21: 111–122.Zomer, R. J. et al. 2006. Carbon, land and water: a global analysis of the hydrologic dimensions of climate change mitigation through afforestation/reforestation. – Colombo, Sri Lanka.Zomer, R. J. et al. 2008. Climate change mitigation: a spatial analysis of global land suitability for clean development mechanism afforestation and reforestation. – Agric. Ecosyst. Environ. 126: 67–80.Supplementary material (Appendix ECOG‐02880 at < www.ecography.org/appendix/ecog‐02880 >). Appendix 1. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Ecography Wiley

ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling

Ecography , Volume 41 (2) – Jan 1, 2018

Loading next page...
 
/lp/wiley/envirem-an-expanded-set-of-bioclimatic-and-topographic-variables-jzhFySFyfI

References (109)

Publisher
Wiley
Copyright
Ecography © 2017 Nordic Society Oikos
ISSN
0906-7590
eISSN
1600-0587
DOI
10.1111/ecog.02880
Publisher site
See Article on Publisher Site

Abstract

The ability to model a species’ geographic distribution, given occurrence records and environmental information, is based on the assumption that abiotic factors directly or indirectly control species distributions (Austin ). Species distribution modeling (SDM) has led to a surge in research on topics such as species’ potential invasiveness (Thuiller et al. ), the impacts of climate change on species distributions (Thuiller , Hijmans and Graham , Morin and Thuiller ), the relative importance of various predictors in determining species range boundaries (Glor and Warren ), historical reconstructions of species distributions (Svenning et al. ), conservation applications such as the identification of suitable habitats for undiscovered populations or reintroductions (Martínez‐Meyer et al. ), analysis of broad‐scale patterns of species richness (Pineda and Lobo ), and spatially‐explicit demographic simulations (Chan and Brown , He et al. ). The ability to conduct such analyses at increasingly broad taxonomic and spatial scales has largely been facilitated by successful efforts to digitize museum specimen records, georeference associated localities (Guralnick et al. , Ellwood et al. ) and provide this information in a standardized format through easily accessible data portals (Constable et al. , Wieczorek et al. ). While progress has been made in these efforts to make high quality occurrence records widely available (e.g. Global Biodiversity Information Facility, < www.gbif.org>), additional progress is still needed in providing and exploring the utility of different environmental datasets for modeling geographic distributions. In particular, it is unknown if currently available and widely used environmental datasets are sufficient and optimal for modeling distributions of terrestrial species.The generation and projection of species distribution models requires data layers of environmental information that provide discriminatory power regarding presence and absence of species. As we typically do not know the true distribution of a species, it can be challenging to determine when an appropriate set of environmental variables has been chosen. Ideally, these variables should have direct relevance to ecological or physiological processes determining species distributions, but for many species this information is not generally available (Alvarado‐Serrano and Knowles ). Correlative niche modeling approaches that rely on statistical associations between species occurrences and environmental variables are frequently used (Peterson et al. , Alvarado‐Serrano and Knowles ), in which the environmental determinants of habitat suitability are not known a priori. The 19 bioclimatic variables from WorldClim (Hijmans et al. ) are perhaps the most broadly employed set of environmental data layers for this purpose, on account of their high resolution, global coverage, and availability for both historical and future climate scenarios. However, the biological suitability of these bioclimatic variables and other such environmental datasets for modeling the distribution of the species in question is often not thoroughly assessed.In the absence of specific knowledge about the environmental variables most likely to determine species distributions, it may be tempting to construct models using a large number of predictor variables, but such models run the risk of poor performance. For example, models built with several highly collinear variables are at an increased risk of overfitting and overparameterization (Dormann et al. , Wright et al. ), and may behave unexpectedly when projected to other time periods or geographic regions where they may encounter combinations of variables that have no analog in model training (Dormann et al. , Owens et al. , Warren et al. ). Additionally, whether large sets of environmental variables or smaller subsets of environmental data are used can greatly impact model predictions (Rödder et al. , Synes and Osborne , Braunisch et al. ). Variable reduction approaches can reduce model overfitting and improve model transferability (Warren et al. , Wright et al. ), yet the relative merits of various approaches are poorly characterized and continue to be explored (Araújo and Guisan , Braunisch et al. ). In general, variables may be reduced either statistically, or by selecting variables from ecological theory that are likely to be important given the physiology of the organism in question (Kearney et al. , Doswald et al. , Rödder et al. , Synes and Osborne ).Given the recognized importance of variable selection in constructing ecological niche models (Synes and Osborne , Braunisch et al. ), increasing the availability of easily accessible datasets of environmental variables that may be ecologically and physiologically important to a variety of organisms should be a priority for improving flexibility and performance of SDM. Several environmental datasets are already available with which to perform SDM (e.g. WorldClim (Hijmans et al. ), PRISM (< www.prism.oregonstate.edu >; Daly et al. ), ClimateNA (Wang et al. , Hamann et al. , Wang et al. )), but not all of these datasets are transferable among time periods or geographic regions or easily integrated with other variables. Additional environmental data layers that conceptually complement and are formatted for easy use alongside the 19 bioclimatic variables from WorldClim (Hijmans et al. ) – one of the most widely used environmental datasets for SDM – would broaden the options available for selection of environmental variables (whether based on ecological theory or through statistical variable reduction) and may lead to improved model performance for some species. Despite the description in the literature of formulae for many such variables that could be computed for particular regions or time periods (see Synes and Osborne as an example), the use of such variables is limited to those researchers with the GIS skills necessary to generate these datasets and the desire to assemble them from several disparate sources.To help satisfy this need, we introduce the ENVIREM dataset (ENVIronmental Rasters for Ecological Modeling): specifically, we provide a set of biologically relevant climatic and topographic variables (all of which have previously been described in the literature) at multiple resolutions and time periods. The variables we include were selected in particular because we hypothesize they are likely to have direct relevance to ecological or physiological processes determining distributions of many species. They should therefore facilitate ecologically‐informed variable selection, and may also result in improved model performance using statistical variable‐thinning approaches. As these variables are intended to complement the existing WorldClim dataset (Hijmans et al. ), we provide the ENVIREM dataset at the same extents and resolutions as WorldClim, for the present, mid‐Holocene, and Last Glacial Maximum (LGM). We also provide an R package (R Core Team) that will enable users to generate these variables from primary sources for any resolution, geographic area, or time period, including for future time periods of interest (for which we have not provided static rasters due to the large number of climate change models in existence that are continually updated as climate‐change projections improve). Finally, through several case studies, we show that the ENVIREM variables can improve model performance and be valuable additions to the set of variables that are currently widely used in species distribution modeling.MethodsWe compiled a list of biologically relevant climatic variables (Table ) that could be derived from monthly temperature and precipitation data (WorldClim ver. 1.4, Hijmans et al. ) and monthly extraterrestrial solar radiation (available from < www.cgiar‐csi.org >). These variables are described by Thornthwaite (), Daget (), Hargreaves and Hargreaves (), Willmott and Feddema (), Vörösmarty et al. (), Zomer et al. (, ), Rivas‐Martínez and Rivas‐Sáenz (), Sayre et al. () and Metzger et al. (). We additionally produced two elevation‐derived topographic variables, terrain roughness index (Wilson et al. ) and topographic wetness index (Boehner et al. , Conrad et al. ), generated from a global 30 arc‐second elevation and bathymetry digital elevation model (Becker et al. ). All variables were produced at the same resolutions as the bioclimatic variables that are currently available through WorldClim: 30 arc‐seconds, and 2.5, 5 and 10 arc‐minutes. Topographic variables were produced at a 30 arc‐second resolution, and subsequently coarsened to match the lower resolutions, rather than constructed directly from lower‐resolution elevation data. As such, the topographic variables of large grid cells at coarser scales represent the average fine‐scale (i.e. 30 arc‐second) values within each grid cell. Calculating the topographic variables in this manner was particularly important to avoid loss of information regarding terrain roughness index when scaling up to coarser resolutions. For the two climate variables related to growing degree‐days (GDD), we note that GDD are accumulated on a daily basis, whereas our estimates are approximations based on mean monthly temperature (Table ).Summary of the variables in the ENVIREM dataset. Citations for variable sources are as follows: A: Zomer et al. (, ); B: Hargreaves and Hargreaves (); C: Thornthwaite (); D: Willmott and Feddema (); E: Vörösmarty et al. (); F: Sayre et al. (); G: Rivas‐Martínez and Rivas‐Sáenz (); H: Daget (); I: Metzger et al. (); J: Wilson et al. (); K: Boehner et al. (); L: Conrad et al. ()Variable abbreviationBrief descriptionUnitsSourceannualPETannual potential evapotranspiration: a measure of the ability of the atmosphere to remove water through evapotranspiration processes, given unlimited moisturemm yr–1A, BaridityIndexThornthwaiteThornthwaite aridity index: index of the degree of water deficit below water need–CclimaticMoistureIndexa metric of relative wetness and aridity–D, Econtinentalityaverage temp. of warmest month – average temp. of coldest month°CF, GembergerQEmbergers pluviothermic quotient: a metric that was designed to differentiate among Mediterranean type climates–HgrowingDegDays0sum of mean monthly temperature for months with mean temperature greater than 0°C multiplied by number of days–IgrowingDegDays5sum of mean monthly temperature for months with mean temperature greater than 5°C multiplied by number of days–ImaxTempColdestMonthmax. temp. of the coldest month°C × 10IminTempWarmestMonthmin. temp. of the coldest month°C × 10ImonthCountByTemp10count of the number of months with mean temp greater than 10°CmonthsIPETColdestQuartermean monthly PET of coldest quartermm month–1IPETDriestQuartermean monthly PET of driest quartermm month–1IPETseasonalitymonthly variability in potential evapotranspirationmm month–1IPETWarmestQuartermean monthly PET of warmest quartermm month–1IPETWettestQuartermean monthly PET of wettest quartermm month–1IthermIndcompensated thermicity index: sum of mean annual temp., min. temp. of coldest month, max. temp. of the coldest month, × 10, with compensations for better comparability across the globe°CF, Gtriterrain roughness index–JtopoWetSAGA‐GIS topographic wetness index–K, LWe generated rasters for all variables at multiple spatial resolutions for current climatic conditions, the mid‐Holocene (approximately 6000 yr ago) and the Last Glacial Maximum (LGM, approximately 22 000 yr ago). For the paleoclimate datasets, we generated variables from three global general circulation models (GCMs): the Community Climate System Model ver. 4 (CCSM4, Collins et al. ), the Model for Interdisciplinary Research On Climate (MIROC‐ESM, Hasumi and Emori ), and the model of the Max Planck Inst. for Meteorology (MPI‐ESM‐P, Stevens et al. ). Fine‐scale monthly rasters for these paleoclimate scenarios were generated from coarse‐resolution GCM output using the delta downscaling method (Ramirez‐Villegas and Jarvis , and < www.worldclim.org/downscaling >) and are available with the WorldClim dataset. As the formulae for some ENVIREM variables require mean monthly temperature, which is available from the WorldClim dataset in the present but not for other time periods, we calculated mean monthly temperature in all time periods as the mean of the maximum and minimum temperatures. In the present, this calculation is highly correlated with the available mean monthly temperatures (Pearson correlation coefficient > 0.99). All raster manipulation and variable creation was carried out in R with the raster package 2.5‐2 (Hijmans et al. ).Additional variables derived from and complementing the 19 bioclimatic variables from WorldClim (Hijmans et al. ) will only be of value in SDM applications if they represent information not currently contained in the 19 bioclimatic variables. To assess the degree of novelty of these new variables, we calculated the Pearson correlation coefficient between each of the ENVIREM variables and the 19 bioclimatic variables from WorldClim, at a global scale (10 arc‐minute resolution), and also by biogeographic realm (Olson et al. , Table , Supplementary material Appendix 1 Table A2), for both the present and the past (CCSM4 global circulation model). Similarly, we also calculated correlation coefficients between terrain roughness index and topographic wetness index with elevation (Table ) to explore whether these variables contain topographic information not captured by elevation alone.Pearson correlation coefficients between each of the climatic ENVIREM variables and the WorldClim bioclimatic variable with which the ENVIREM variable is most strongly correlated (Supplementary material Appendix 1 Table A1), globally and in separate biogeographic realms. For each variable and realm, the bottom‐left triangle contains the correlation coefficient in the present, and the top‐right triangle contains the correlation coefficient in the LGM (CCSM4) for the same bioclimatic variable. Grey shading indicates that the absolute value of the correlation is ≤ 0.85Pearson correlation coefficients between ENVIREM topographic variables and elevation, at a global scale as well as in different biogeographic realmsNeotropicalPalearcticNearcticIndo‐malayAfrotropicOceaniaAustralasiaGlobalTerrain roughness0.650.580.480.830.410.190.650.46Topographic wetness–0.59–0.45–0.42–0.67–0.37–0.49–0.53–0.39Case studiesTo investigate how the inclusion of the ENVIREM variables could affect the performance and predictions of species distribution models, we generated species distribution models with Maxent ver. 3.3.3k (Phillips et al. ) for 20 North American terrestrial vertebrate species, using the curated occurrence dataset from Waltari et al. (). Specifically, we generated niche models using three different sets of initial environmental predictor variables. Firstly, we generated models using only the 19 bioclimatic variables from WorldClim (referred to hereafter as the bioclim model). Secondly, we built models using the 19 bioclimatic variables plus 14 of the climatic ENVIREM variables (hereafter referred to as the bioclim + envirem‐clim model). Finally, we generated niche models with the 19 bioclimatic variables and 16 ENVIREM variables, including 14 climatic variables and the two topographic variables (the bioclim + envirem‐all model). Note that none of the models, including bioclim + envirem‐all, included elevation as a predictor variable. We chose not to include two variables, aridityIndexThornthwaite as it was conceptually redundant with the climaticMoistureIndex, and monthCountByTemp10 because it is a categorical variable that would not have been amenable to the variable selection procedure that we applied. Finally, we did not generate any models using only the ENVIREM variables without the 19 bioclimatic WorldClim variables, as the ENVIREM variables are intended to supplement, not replace, the bioclimatic variables. All distribution modeling was performed in the dismo package ver. 1.0‐15 in R (Hijmans ) from rasters at a 2.5 arc‐minute resolution. This resolution is likely a reasonable match to the locational accuracy of the species occurrences, as these come primarily from museum collections, and is the resolution used for SDM in the original study (Waltari et al. ).To construct each model, we first spatially thinned the occurrence records, retaining only occurrences that were greater than ten kilometers in proximity to one another, using the spThin package in R (Aiello‐Lammens et al. ). For each species individually, we defined the model‐training region by adding a 1000 km buffer around all occurrence records (Supplementary material Appendix 1 Fig. A1). All occurrence data and rasters were transformed and projected to the North America Albers Equal Area Conic projection, as it has been shown that a failure to account for changing grid‐cell area across latitudes can negatively impact SDM results (Budic et al. ). We statistically thinned variables to include in each model for each species using the ‘corSelect’ function in the fuzzySim package ver. 1.6.3 in R (Barbosa ) where each pair of variables that is correlated above a set threshold is tested against the response variable (species presence and absence) with a bivariate model. The variable with a better fit as measured with AIC is selected while the other is dropped, and the procedure is repeated until all pairwise correlations are below the threshold. We applied a correlation threshold of 0.75, and generated pseudo‐absences from 10 000 randomly sampled points throughout the training region (excluding grid cells with known occurrence records) because there were no true absence records in our data.For each species, we measured SDM performance for the bioclim, the bioclim + envirem‐clim and the bioclim + envirem‐all models (with reduced sets of variables via statistical thinning as described above, Table ) using three threshold‐independent evaluation metrics: AUCTEST, AUCDIFF, and the size‐corrected Akaike information criterion (AICc). AUCTEST is a metric that measures the discriminatory ability of the species distribution model at test localities withheld during model construction, and thus represents the ability of the model to predict species presence (Peterson et al. ). AUCDIFF is the difference between the AUC calculated from training localities and AUCTEST, and is a measure of model overfitting, with higher values of AUCDIFF representing more overfit models (Warren and Seifert ). AICc is an information theoretic metric that balances model fit against degrees of freedom from parameterization (i.e. model complexity), such that lower values of AICc correspond to models with better goodness‐of‐fit accounting for model complexity (Burnham and Anderson , Warren and Seifert ). For AUC metrics, we partitioned calibration and evaluation data via the masked geographically‐structured partitioning scheme described by Radosavljevic and Anderson (), implemented in the R package ENMeval ver. 0.2.1 (Muscarella et al. ), which leads to more realistic and less biased estimates of SDM performance than the more traditionally used random k‐fold partitioning scheme. This partitioning scheme divides occurrence records into four geographic regions with an equal number of occurrence records, and calculates AUC metrics as the average of those metrics calculated individually using each of the four possible partitions of geographic regions into one region of evaluation data and three regions of calibration data. AICc was calculated from the full, non‐partitioned models.ENVIREM and WorldClim variables included in the bioclim, bioclim + envirem‐clim and bioclim + envirem‐all models, for four case study species. Variables included in each model were selected using a statistical variable selection approach (see Methods section for additional details)Spotted salamander | bioclimSpotted salamander | bioclim + envirem‐climSpotted salamander | bioclim + envirem‐allBlue grouse | bioclimBlue grouse | bioclim + envirem‐climBlue grouse | bioclim + envirem‐allCalifornia gnatcatcher | bioclimCalifornia gnatcatcher | bioclim + envirem‐climCalifornia gnatcatcher | bioclim + envirem‐allMountain chickadee | bioclimMountain chickadee | bioclim + envirem‐climMountain chickadee | bioclim + envirem‐allAnnual mean temp [bio1]Mean diurnal temp range [bio2]+++++++++Isothermality [bio3]+Temp seasonality [bio4]+Max temp warmest month [bio5]Min temp coldest month [bio6]Temp annual range [bio7]++++++Mean temp of wettest quarter [bio8]++++++Mean temp of driest quarter [bio9]+++Mean temp of warmest quarter [bio10]++++Mean temp of coldest quarter [bio11]Annual precip [bio12]+Precip of wettest month [bio13]Precip of driest month [bio14]++++++Precip seasonality [bio15]+++++++++Precip of wettest quarter [bio16]+++Precip of driest quarter [bio17]+Precip of warmest quarter [bio18]+++++++++Precip of coldest quarter [bio19]+++++++annualPETclimaticMoistureIndex++++++continentality++embergerQgrowingDegDays0growingDegDays5maxTempColdestminTempWarmest++++PETColdestQuarterPETDriestQuarter++PETseasonality++++++PETWarmestQuarter++PETWettestQuarter++++++thermicityIndextopoRoughnesstopoWetness++++The complexity of SDMs built with Maxent can be adjusted with the regularization multiplier, increased values of which lead to less parameterized models, as well as with the inclusion of additional feature classes (i.e. transformations of the original predictor variables) that allow for increasingly complex models. We evaluated distribution models across different sets of permissible feature classes, and for each of these, across a range of regularization multiplier values. The evaluation metrics described above were used to determine optimal feature class and model complexity for each model individually (Muscarella et al. ).After selecting optimal feature class and model complexity for each model, we also compared performance of the optimal models across each of the three variable sets (i.e. bioclim, bioclim + envirem‐clim, and bioclim + envirem‐all) using the same evaluation metrics. The AUC metrics describe absolute performance of the models (ranging from 0 to 1). AICc, however, describes relative performance of candidate models. For this metric, we define a model as having substantial support over another if it has a difference in AICc greater than or equal to four, as models with AICc values more similar than this are generally considered to have equivalent support (Burnham and Anderson ). Although we present results for all evaluation metrics, we ultimately favor AICc for selecting the optimal model and variable set for each species, as the focus of our case studies is on model comparison, and AICc has been shown to perform better than AUC metrics according to a range of criteria, including the selection of optimal levels of model complexity, model transferability in space and time, and the relative ranking of variable importance (Warren and Seifert , Warren et al. , Moreno‐Amat et al. ).The impact of using different environmental variables in niche modeling may not be apparent if two sets of variables lead to similar projected distributions in the present. However, if the degree of correlation between two different sets of variables differs in the past compared to in the present, then variable choice might have a greater effect on SDM projections to other time periods. To explore this possibility, we calculated niche similarity in the present and in the LGM using Schoener's D (Schoener , Warren et al. ), a metric that quantifies the degree of niche overlap in geographic space. Values of D range from 0 (completely different niches across geographic space) to 1 (identical niches over geographic space). Overlap was quantified with the fuzzySim package in R (Barbosa ). For each case‐study species we focused the niche overlap calculation on the geographic regions of the model projections where comparisons among models are most meaningful, rather than across broad regions of the continent where all models predict low habitat suitability and are thus very similar. In particular, we calculated niche overlap statistics only over the geographic region predicted to contain suitable habitat in at least one of the models. To define this region, we first reduced the geographic extents of interest for both the projected bioclim and bioclim + envirem‐clim models individually using a habitat suitability threshold that preserved 95% of the training presences. We further excluded areas outside the model training region, except for a few species where the majority of the predicted LGM distribution lay outside the training region. Finally, we combined these regions for both the bioclim and bioclim + envirem‐clim models and calculated niche overlap from (non‐thresholded) model projections within this combined region.We did not project the bioclim + envirem‐all model to the LGM, because topographic variables are difficult to interpret for the LGM in glaciated regions of North America. These regions have experienced substantial changes in topography since the LGM due to glacial erosion (Bell and Laine ). However, we note that models using topographic variables could be projected to the LGM in particular regions of interest where topographic variables can be assumed to have remained static since the LGM (e.g. unglaciated regions of California, Bemmels et al. ).Data depositionThe ENVIREM dataset has been deposited through the Univ. of Michigan Deep Blue Data repository < http://dx.doi.org/doi:10.7302/Z2BR8Q40> (Title and Bemmels ), and can be accessed through the project website at < www.envirem.github.io >. The ‘envirem’ R package is available on CRAN.ResultsThe ENVIREM dataset comprises variables that were generated for three time periods (present, mid‐Holocene and the LGM), using several different general circulation models (CCSM4, MIROC‐ESM, MPI‐ESM‐P) at multiple resolutions, so as to facilitate integration with rasters from WorldClim (Hijmans et al. ). All rasters are available for download at envirem.github.io. To enable users to generate these variables from other circulation models or time periods, we have provided all code in an R package ‘envirem’, available from CRAN.At a global scale, most new climatic variables were highly correlated with at least one of the 19 bioclimatic variables from WorldClim (Table ). The aridity‐related variables (i.e. climatic moisture index and Thornthwaite's aridity index) and some of the PET‐related variables were the least redundant at the global scale. However, many of the new variables were less highly correlated with the 19 bioclimatic variables within specific biogeographic realms. Oceania and the Afrotropics were the realms with the greatest number of new variables with lower maximum correlation coefficients (≤ 0.85), indicating that niche models of species from those regions may benefit most from the inclusion of these new variables. More often than not, correlations were lower during the mid‐Holocene and LGM than in the present (Supplementary material Appendix 1 Table A2, Table ), which indicates that even if specific sets of variables are redundant in the present, they may not necessarily be redundant in other time periods and variable choice could have greater impacts on model projections to other time periods. All new climatic variables had a maximum correlation of ≤ 0.85 in at least one biogeographic realm during at least one time period, with the exception of continentality, thermicity index, maximum temperature of the coldest month and minimum temperature of the warmest month. Some new variables were consistently most highly correlated with the same bioclimatic variable from WorldClim across regions, while other new variables were most highly correlated with different bioclimatic variables across different regions (Supplementary material Appendix 1 Table A1).In terms of topographic variables derived from elevation, terrain roughness index was not highly correlated with elevation globally or in any biogeographic region (Table ). Topographic wetness index was also not highly correlated with elevation (Table ), even though higher values of topographic wetness are conceptually associated with lower elevations at a local scale (i.e. within a given watershed; Boehner et al. ).Case studiesStatistical thinning of the sets of variables prior to ecological niche modeling substantially reduced the number of variables, with three to 11 variables retained in each model (Table , Supplementary material Appendix 1 Table A3). For all species, at least one ENVIREM variable was retained in the bioclim + envirem‐clim models. For the bioclim + envirem‐all models, at least one topographic variable was retained for 19 of 20 species. For most species, one or more bioclimatic variables that were retained in the bioclim model were dropped from the bioclim + envirem‐clim and bioclim + envirem‐all models and were replaced by one or more of the ENVIREM variables, indicating that these variables are more strongly predictive of the presence and absence of the species than the dropped bioclim variables (Supplementary material Appendix 1 Table A3). The impact of including ENVIREM variables on model performance varied among species, but models containing ENVIREM variables performed substantially better (according to the AICc metric) than the bioclim model in 13 of 20 species.In Fig. , we highlight results for four species that show particularly distinct improvement with the ENVIREM variables: the spotted salamander Ambystoma maculatum, the blue grouse Dendragapus obscurus, the California gnatcatcher Polioptila californica and the mountain chickadee Poecile gambeli. In these four species, inclusion of ENVIREM variables led to improvements in all metrics of model performance, although differences in AICc values were more substantial than differences in AUC metrics for these species. Across the 16 other case study species (Supplementary material Appendix 1 Fig. A2), an improvement in performance when including ENVIREM variables was found for ten species according to greater AUCTEST values (Arborimus longicaudus, Chamaea fasciata, Desmognathus wrighti, Dicamptodon tenebrosus, Elaphe obsoleta, Glaucomys sabrinus, Glaucomys volans, Lampropeltis zonata, Martes americana and Myodes gapperi). However, substantial improvements in model performance (improvement by more than four AICc units) were found for all but seven species according to AICc values, with no substantial difference for Dicamptodon tenebrosus, Elaphe obsoleta and Lepus arcticus, and a substantial decrease in performance for four species (Crotalus atrox, Dicrostonyx groenlandicus, Glaucomys volans and Myodes gapperi). Inclusion of ENVIREM topographic variables specifically led to especially notable improvements in AICc scores for Poecile gambeli (Fig. ), Eumeces fasciatus, Blarina brevicauda and Plethodon idahoensis (Supplementary material Appendix 1 Fig. A2).Ecological niche model performance with and without the ENVIREM variables for four selected case study species. Each line represents the set of feature classes that led to the best performance according to either AUCTEST (top and middle panels) or AICc (bottom panel), with performance evaluated across a range of regularization multiplier values (Supplementary material Appendix 1 Table A4). AUCDIFF is a measure of model overfitting for the model selected by maximizing AUCTEST. In the AUC plots, the dotted line represents the value for the best‐performing model. In the AICc plots, the grey shading represents a ΔAICc of 4 from the best (lowest) AICc score. Performance of models within the grey polygon is not considered to be substantially different (Burnham and Anderson ).The optimal Maxent parameters identified by the model evaluation metrics were typically not concordant across the bioclim, bioclim + envirem‐clim, and bioclim + envirem‐all models (Supplementary material Appendix 1 Table A4). Similarly, as the different metrics evaluate the niche models using conceptually different criteria, AUC‐based evaluations did not identify the same Maxent parameters as AICc‐based evaluations (Supplementary material Appendix 1 Table A4). As the focus of our case studies is on the choice of variables employed, an in‐depth examination of the differences between AUC and AICc‐based optimization of Maxent is beyond the scope of our study. We therefore focus the rest of our results and discussion on comparing predictions of models that were optimized based on AICc (see Methods).Projections of the AICc‐optimized species distribution models constructed with and without the ENVIREM variables generally did not differ greatly at continental scales for the current time period, but regional‐scale differences in habitat suitability were observed. For the four case‐study species showing greatest improvement in all evaluation metrics, the overall suitable ranges are very similar, though not identical, at the continental scale (Fig. ). In finer‐scale maps focusing on a particular region of interest, however, there are more substantial differences in suitability across the landscape at a regional scale (Fig. ). For example, suitability of the California Central Valley for Polioptila californica is much higher in the bioclim model than in the bioclim + envirem‐clim model. Similarly, regions of the California coast and northwestern Great Basin for Dendragapus obscurus are also considerably different across models, as well as large areas of the interior range of Poecile gambeli. Niche overlap (Schoener's D) between the two models averaged 0.9 for these four species and 0.91 across all modelled species (Supplementary material Appendix 1 Fig. A3, Table A5).Predicted habitat suitability during the current time period for four case study species, from Maxent models optimized in terms of feature class and regularization parameter according to the AICc metric, for models constructed with and without the ENVIREM variables. Suitability scores range from 0 (blue) to 1 (red). The central, continental‐scale maps show habitat suitability within the training region only (see text for explanation), with predicted habitat suitability below a 95% training presence threshold considered to be unsuitable (grey). The outer maps show detail from the region within the box on the continental maps, selected to highlight local‐scale differences between the models. Occurrence records are shown as black points. Schoener's D niche overlap is calculated between the bioclim and the bioclim + envirem‐clim models, exclusively within the thresholded training regions (Supplementary material Appendix 1 Fig. A1; see the Methods section for additional details).Differences between the predictions of the AICc‐optimized bioclim and bioclim + envirem‐clim models become more pronounced when projected to the LGM (Fig. , Supplementary material Appendix 1 Table A5). In particular, Schoener's D niche overlap scores are much lower in the LGM (mean = 0.71, 0.71 and 0.72 for GCM CCSM4, MPI‐ESM‐P and MIROC‐ESM, respectively) compared to the present, and for many species there are considerable differences between models in predicted distribution in the LGM (Fig. ). For Ambystoma maculatum, habitat suitability in the bioclim model was highest on exposed continental shelf off the coast of North Carolina, whereas in the bioclim + envirem‐clim model the highest habitat suitability was in the Lower Mississippi River Valley. For Dendragapus obscurus, connectivity between regions was greater in the bioclim + envirem‐clim model, and areas of high habitat suitability included the Columbia Plateau and northern Cascades. Both models for this species also showed marginally to moderately suitable habitat in western Canada and Alaska, although this may be an overprediction as at least part of this region was covered by the Cordilleran ice sheet during the LGM (Dyke et al. ). For Polioptila californica, the bioclim model predicted large regions of California to be suitable, including California's Central Valley, whereas in the bioclim + envirem‐clim model, higher suitability was primarily restricted to Baja California and coastal regions of southern California. For Poecile gambeli, visual differences between model projections were even greater, with high habitat suitability in the Rocky Mountains in the bioclim + envirem‐clim model only, and much higher habitat suitability throughout most of the species’ range overall, and the Great Basin in particular.Predicted habitat suitability during the Last Glacial Maximum for four case study species, for models constructed with and without the ENVIREM variables. Suitability scores range from 0 (blue) to 1 (red). Optimization of model parameters and thresholding are as in Fig. . Schoener's D niche overlap is calculated between the bioclim and the bioclim +envirem‐clim models, exclusively within the thresholded training regions (Supplementary material Appendix 1 Fig. A1; see the Methods section for additional details). Habitat suitability is shown within the training region only, with predicted habitat suitability below a 95% training presence threshold considered to be unsuitable (grey).DiscussionWe have generated 18 climatic and topographic variables that will be valuable in a broad array of applications for species distribution modelling, and have made these variables easily available and complementary to an existing widely‐used environmental dataset. Although they are largely derived from the same underlying dataset as the bioclimatic variables from WorldClim, we have demonstrated that including the ENVIREM variables in SDM can lead to notable improvements in performance and differences in projections of species distribution models. Inclusion of these new variables led to substantial improvement in SDM performance (AICc metric) in 13 out of 20 species, and substantially worse performance in only four species. Although inclusion of the ENVIREM variables did not always lead to significantly improved performance, the fact that they were beneficial to many species indicates that they are generally worth consideration when constructing species distribution models. The species‐specific nature of our results also highlight the importance of following best practices for variable selection and parameter optimization, as we have done here. The importance of particular variables in SDM will be a function of the species under study, its distribution in geographic and climatic space, the time period and geographic region of interest, and the ultimate question being addressed. Nonetheless, the links to ecological and physiological processes represented in many of the ENVIREM variables mean that they will likely be particularly useful for a wide variety of applications.Potential applicationsAs we have showcased here, the ENVIREM dataset will be of immediate value in SDM applications and will potentially lead to the generation of better species distribution models. If variable selection is done via statistical approaches, then inclusion of these variables will allow researchers to start with a larger pool of biologically relevant options, thereby increasing the odds that variables that are highly informative regarding the presence and absence of a species will be discovered. If the goal is to select variables a priori based on the ecology and natural history of the organism, then the ENVIREM variables will provide valuable options, as they are likely to be ecologically relevant to certain species and may have specific ties to biological processes for many species. SDM has been employed as a tool in a large variety of studies, and the inclusion of new variables has the potential to impact their conclusions. Identifying better sets of predictor variables for certain species could, among other things, potentially alter projections of species’ invasiveness for particular regions (Peterson and Nakazawa ), alter our understanding of potentially suitable habitat for species introductions (Martínez‐Meyer et al. ), lead to identification of new areas of high habitat suitability for conservation interest, affect predictions of shifts in habitat suitability in response to future climate change (Thuiller , Hijmans and Graham , Morin and Thuiller ), lead to new phylogeographic hypotheses about where species may have been distributed in the past (Chan and Brown , He et al. , Bemmels et al. ), and impact our understanding of the evolution of climatic tolerances across related species (Title and Burns , Kozak and Wiens ).With these additional variables, ecologists and evolutionary biologists will also be able to craft more specific hypotheses that are informed by the ecology of the organisms under study. For example, in an integrative distributional, demographic and coalescent (iDDC) framework (Knowles and Alvarado‐Serrano , Brown and Knowles , He et al. ), these variables will allow for the specification of competing hypotheses pertaining to the relative importance of different climatic and topographic variables in constraining the distribution of species over time (Bemmels et al. ), giving researchers greater flexibility than currently exists in modeling spatial and genetic patterns over time. Another example would be the inclusion of these additional variables in the spatial mapping of the velocity of climate change, which can tell us how organisms must move to track their current climatic conditions (Hamann et al. ).To our knowledge, this is the only existing multi‐variable dataset that is truly complementary to WorldClim in its geographic breadth, application and accessibility. The Climond dataset (Kriticos et al. ) provides an extended suite of bioclimatic variables only at 10 and 30 arc‐minutes for current and future climate scenarios, while the Ecoclimate dataset (Lima‐Ribeiro et al. ) provides only the standard 19 bioclimatic variables for multiple past, present and future time periods at 30 arc‐minutes. Other variables potentially useful for biodiversity modeling have been released, such as habitat heterogeneity (Tuanmu and Jetz ), global cloud cover (Wilson and Jetz ) and region‐specific variables, such as ClimateNA (Wang et al. , Hamann et al. , Wang et al. ), but these variables are either not transferrable to other time periods, not available globally or not available at finer spatial resolutions. In contrast, the ENVIREM dataset includes additional variables (some of which overlap with the Climond dataset) at all of the resolutions currently available from WorldClim, for past and current time periods. If researchers wish to perform SDM using occurrences that have high spatial precision in areas where region‐specific datasets for all desired time periods are available, then alternatives to the ENVIREM dataset may prove most useful (e.g. ClimateNA; Wang et al. ). However, such a situation is likely to represent only a small minority of SDM applications, making the ENVIREM dataset more generally applicable. In addition, the envirem R package makes it possible to generate these variables for other time periods, or from alternative input datasets (for example PRISM; Daly et al. ), allowing users to easily customize their use of these variables.Biological relevance of ENVIREM variablesAlthough the potential applications of these variables to SDM are vast, one unique benefit of the ENVIREM variables is their potential for improving our ability to construct niche models informed by ecological knowledge and natural history. Biologically informed niche models may be constructed for species for which the conceptual relationships between particular variables and biological processes relevant to determining a species’ distribution are known a priori (Kearney et al. , Doswald et al. , Rödder et al. , Synes and Osborne ), or may be constructed with the intention of exploring and testing different hypotheses about these relationships (Bemmels et al. ).The potential mechanisms by which the ENVIREM variables may determine distributions are numerous and will be specific to the species of interest. In general, subsets of the ENVIREM variables may directly control species distributions, or (more commonly) may impact other processes that in turn determine distributions (Austin ). The particular variables included in the ENVIREM dataset were selected because of their clear conceptual links to particular ecological processes and indices. For example, growing degree‐days are predictive of plant phenology and growth rate (McMaster and Wilhelm ), processes which impact species range limits (Morin et al. ) and drive local adaptation (Howe et al. ). Evapotranspiration not only describes climate generally, but is also physiologically linked to plant growth potential due to its impact on gas exchange with the atmosphere and temperature regulation (Thornthwaite , Katul et al. ). The more complex climatic indices included in the ENVIREM variables (e.g. thermicity, aridity, moisture, Emberger's pluviothermic quotient) may characterize environmental conditions that are more directly physiologically relevant to given species than simple descriptors of climate such as temperature or precipitation alone (Daget ). Finally, the topographic ENVIREM variables could conceivably be important predictors of habitat types associated with local‐ to regional‐scale relief that may be key predictors of species distributions at these spatial scales (Lassueur et al. , Austin and Van Niel ). We have provided just a few examples of potential links to biological factors that could determine species distributions, but the ecological relevance of any of the ENVIREM variables is likely to be species‐specific and different species’ distributions may be associated with environmental variables because of different mechanisms. Nonetheless, it is this type of conceptual relevance and these potential links to physiological and ecological processes that will make the ENVIREM variables particularly useful for many SDM applications.Incorporating ENVIREM variables into SDM best practicesIdeally, the choice of variables for niche modeling should be informed by knowledge of the natural history and ecology of the organism under study, as this approach has been shown to produce more realistic niche models (Rödder et al. , Saupe et al. ). However, it is most often the case that such information is not readily known (Alvarado‐Serrano and Knowles ). How one should go about choosing bioclimatic variables is still an open question, the impact of which can be considerable (Peterson and Nakazawa , Synes and Osborne , Braunisch et al. ). It is generally not considered best practice to include all bioclimatic variables, as they exhibit a high degree of collinearity. This collinearity tends to lead to overly complex, overfit models (Rodda et al. ). Additionally, the nature of the correlation between bioclimatic variables may differ across time periods, potentially leading to unexpected behavior in SDM projections (Rodda et al. , Synes and Osborne , Dormann et al. , Warren et al. ). While we expect that many researchers will find the ENVIREM variables extremely useful for a variety of applications, we recommend that the merits of including all or some of the ENVIREM variables should be carefully considered relative to the specific application, and that variable thinning, model optimization, and other best practices in ecological niche modeling should be followed (Merow et al. , Alvarado‐Serrano and Knowles ). For example, as we do not have in‐depth ecological information about the species whose ecological niches were modeled in our case studies, we employed a statistical approach to variable thinning in order to reduce the number of correlated variables, while retaining the variables with the greatest explanatory power.An important finding of our case studies was that the difference between the bioclim and bioclim + envirem‐clim models, as measured with Schoener's D, was small in the present, but greater in the LGM. Choice of predictor variables has previously been shown to have large impacts on model projections to other time periods or geographic regions (Peterson and Nakazawa , Synes and Osborne , Braunisch et al. ). The impact of variable selection points both to the utility of additional variables for developing and testing hypotheses about shifts in species distributions across different time periods and in novel spatial contexts, but also to the need for caution when making modeling decisions. Ideally, models could be evaluated in past time periods with independent fossil occurrences (Davis et al. , Gavin et al. , Moreno‐Amat et al. ), but their availability will depend on the taxon under study.In addition to the question of which environmental variables to use, a growing number of studies have demonstrated that species‐specific tuning of virtually all steps in the niche modeling pipeline can lead to improved results, and that Maxent's default behavior is often not sufficient to achieve optimal performance (Anderson and Gonzalez , Warren and Seifert , Merow et al. , Radosavljevic and Anderson , Moreno‐Amat et al. ). Although we could have held all aspects save the predictor variables constant in the generation of niche models in order to be able to compare the results directly, generating models in this way is considered poor practice. Instead, we chose to independently generate the best possible models, given current best practices. We found that Maxent's default parameters were rarely optimal (Supplementary material Appendix 1 Table A4), which echoes the findings of others that parameter tuning is an important step toward generating less overfit and more transferable species distribution models (Anderson and Gonzalez , Warren and Seifert , Merow et al. , Radosavljevic and Anderson , Moreno‐Amat et al. ). Different evaluation metrics most often did not lead to the selection of the same optimized parameters (Supplementary material Appendix 1 Table A4). This is expected, as AICc is intended to minimize the number of necessary parameters, while AUC metrics are not. Regardless of the environmental variables selected for SDM, the optimization of model parameters should always be considered, as model parameters can have a large impact on model performance and predictions (Fig. , Supplementary material Appendix 1 Fig. A2).Utility of topographic variables in SDMIn addition to climatic variables, we also generated two topographic indices: topographic roughness and topographic wetness. These variables offer novel information as they are not redundant with elevation (Table ), an environmental variable which is already broadly available for SDM. The use of elevation in SDM has been controversial (Hof et al. ), and may be particularly problematic when projecting to other time periods or geographic contexts where relationships between elevation and the climatic factors determining a species’ niche may be different than the relationships in the context in which the model was built. However, the topographic roughness and topographic wetness indices are less likely to suffer from this complication because they are less causally linked than elevation to regional‐scale climate, and they contain topographic information that may be useful for determining species distributions independent of climate. In particular, topographic roughness index may be a reasonable surrogate for habitat heterogeneity and microsite availability that could be relevant to determining geographic distributions of some species, and topographic wetness index may help distinguish between areas that experience similar regional climate but differ markedly in microhabitat due to relative drainage position within a watershed.However, it is important to consider whether topographic variables are available at an appropriate geographic scale for predicting species distributions. Variation in topographic features associated with microhabitats may occur at a much finer scale than that at which topographic variables are assessed, which could reduce their utility for SDM (Lassueur et al. , Austin and Van Niel , Pradervand et al. ). Since all topographic ENVIREM variables at all resolutions are ultimately averaged from values calculated from the finest‐scale (30 arc‐second) elevational model (see Methods), we have minimized concerns about the potential mismatch between the scale at which the indices were generated and at which topography is relevant to a species. However, it is still important to consider whether variation in topographic roughness and wetness at the 30 arc‐second scale (approximately 926 m at the equator) is likely to be meaningful for the species in question for the particular geographic region of interest and intended modeling application.Nonetheless, our case studies revealed that including topographic variables led to distinct improvement in SDM performance for several species, in some cases significantly exceeding the improvement gained by adding only the climatic ENVIREM variables (Fig. , Supplementary material Appendix 1 Fig. A2). These results once again emphasize the species‐specific nature of the degree of utility of any new variable. Topographic variables are likely to be particularly useful for exploring competing hypotheses regarding whether local‐ to regional‐scale factors such as microsite availability are important in determining species’ distributions (Bemmels et al. ).Beyond general considerations about whether or not topographic variables are important for modeling a species’ distribution, care should also be taken in assessing whether or not static variables (i.e. variables that do not change over time) are appropriate to use for a given SDM application. The topographic variables we derive can be assumed to be largely static through time (especially in unglaciated regions, with the exception of changes in coastline reflecting sea‐level changes). Stanton et al. () explored the inclusion of static variables in SDM and found that including such variables when projecting to future climate‐change scenarios typically improved, and rarely hindered, SDM performance when the variables were known to influence species distributions. Nonetheless, we recommend particular caution when projecting to contexts where topography may have changed substantially over the timescale of interest, for example due to Pleistocene glacial erosion in North America (Bell and Laine ).ConclusionsThe ENVIREM variables constitute a valuable dataset for species distribution modeling for a variety of applications. Although they are complementary to and largely derived from the WorldClim database that is already widely in use, they contain novel information not captured by this database. In particular, the ENVIREM variables include conceptually novel climatic variables that may more closely reflect specific ecological and physiological processes, as well as topographic variables distinct from elevation that may represent non‐climatic local‐ to regional‐scale aspects of a species’ niche. In our exploration of case studies for 20 North American vertebrate species, the impact of including the ENVIREM variables was species‐specific: in 13 out of 20 cases model performance substantially improved compared to a model using only WorldClim variables, particularly when topographic ENVIREM variables were included; in seven cases model performance was not substantially different or declined. In general, models built with and without the ENVIREM variables produced habitat suitability predictions differing only modestly and at local scales in the current time period, but sometimes resulted in dramatic regional‐scale differences in predicted habitat suitability when projected to a different time period. Overall, our results highlight how the ENVIREM variables often improve model performance, even when biological information about the variables that are most relevant to determining habitat suitability for a given species is not known a priori. Furthermore, when knowledge about the determinants of species distributions is available from ecological theory, the ENVIREM variables may be particularly useful for developing and testing the predictions of species‐specific hypotheses. The significant improvements in model performance we observed for many species when following best practices in species distribution modeling suggest that the ENVIREM variables are worth general consideration for SDM, as their main benefit is providing a more comprehensive set of environmental variables to choose from, whether through statistical variable thinning or variable selection informed by ecological knowledge.Acknowledgements – We would like to thank L. L. Knowles for her guidance in developing this project. This manuscript greatly benefited from comments from G. Costa.Funding – Provided for graduate student support by an NSF GRFP fellowship (JBB) and a Univ. of Michigan Dept of Ecology and Evolutionary Biology Edwin H. Edwards Scholarship in Biology (JBB).ReferencesAiello‐Lammens, M. E. et al. 2015. spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. – Ecography 38: 1–5.Alvarado‐Serrano, D. F. and Knowles, L. L. 2014. Ecological niche models in phylogeographic studies: applications, advances and precautions. – Mol. Ecol. Resour. 14: 233–248.Anderson, R. P. and Gonzalez, I. Jr 2011. Species‐specific tuning increases robustness to sampling bias in models of species distributions: an implementation with Maxent. – Ecol. Model. 222: 2796–2811.Araújo, M. B. and Guisan, A. 2006. Five (or so) challenges for species distribution modelling. – J. Biogeogr. 33: 1677–1688.Austin, M. P. 2002. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. – Ecol. Model. 157: 101–118.Austin, M. P. and Van Niel, K. P. 2011. Improving species distribution models for climate change studies: variable selection and scale. – J. Biogeogr. 38: 1–8.Barbosa, A. M. 2015. fuzzySim: applying fuzzy logic to binary similarity indices in ecology. – Methods Ecol. Evol. 6: 853–858.Becker, J. J. et al. 2009. Global bathymetry and elevation data at 30 arc seconds resolution: SRTM30_PLUS. – Mar. Geodesy 32: 355–371.Bell, M. and Laine, E. P. 1985. Erosion of the Laurentide region of North America by glacial and glaciofluvial processes. – Quat. Res. 23: 154–174.Bemmels, J. B. et al. 2016. Tests of species‐specific models reveal the importance of drought in postglacial range shifts of a Mediterranean‐climate tree: insights from integrative distributional, demographic and coalescent modelling and ABC model selection. – Mol. Ecol. 25: 4889–4906.Boehner, J. et al. 2002. Soil regionalization by means of terrain analysis and process parameterization. – In: Micheli, E. et al. (eds), Soil classification 2001. European Soil Bureau, Research Report no. 7, Luxembourg, pp. 213–222.Braunisch, V. et al. 2013. Selecting from correlated climate variables: a major source of uncertainty for predicting species distributions under climate change. – Ecography 36: 971–983.Brown, J. L. and Knowles, L. L. 2012. Spatially explicit models of dynamic histories: examination of the genetic consequences of Pleistocene glaciation and recent climate change on the American pika. – Mol. Ecol. 21: 3757–3775.Budic, L. et al. 2015. Squares of different sizes: effect of geographical projection on model parameter estimates in species distribution modeling. – Ecol. Evol. 6: 202–211.Burnham, K. P. and Anderson, D. R. 2004. Multimodel inference: understanding AIC and BIC in model selection. – Soc. Method Res. 33: 261–304.Chan, L. M. and Brown, J. L. 2011. Integrating statistical genetic and geospatial methods brings new power to phylogeography. – Mol. Phylogenet. Evol. 59: 523–537.Collins, W. D. et al. 2006. The community climate system model version 3 (CCSM3). – J. Clim. 19: 2122–2143.Conrad, O. et al. 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4. – Geosci. Model Develop. 8: 1991–2007.Constable, H. et al. 2010. VertNet: a new model for biodiversity data sharing. – PLoS Biol. 8: e1000309.Daget, P. 1977. Le bioclimat méditerranéen: analyse des formes climatiques par le système d'Emberger. – Vegetatio 34: 87–103.Daly, C. et al. 2002. A knowledge‐based approach to the statistical mapping of climate. – Clim. Res. 22: 99–113.Davis, E. B. et al. 2014. Ecological niche models of mammalian glacial refugia show consistent bias. – Ecography 37: 1133–1138.Dormann, C. F. et al. 2012. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. – Ecography 36: 27–46.Doswald, N. et al. 2009. Potential impacts of climatic change on the breeding and non‐breeding ranges and migration distance of European Sylvia warblers. – J. Biogeogr. 36: 1194–1208.Dyke, A. S. et al. 2002. The Laurentide and Innuitian ice sheets during the last glacial maximum. – Quat. Sci. Rev. 21: 9–31.Ellwood, E. R. et al. 2015. Accelerating the digitization of biodiversity research specimens through online public participation. – BioScience 65: 383–396.Gavin, D. G. et al. 2014. Climate refugia: joint inference from fossil records, species distribution models and phylogeography. – New Phytol. 204: 37–54.Glor, R. E. and Warren, D. L. 2011. Testing ecological explanations for biogeographic boundaries. – Evolution 65: 673–683.Guralnick, R. P. et al. 2006. BioGeomancer: automated georeferencing to map the worldfs biodiversity data. – PLoS Biol. 4: e381.Hamann, A. et al. 2013. A comprehensive, high‐resolution database of historical and projected climate surfaces for western North America. – Bull. Am. Meteorol. Soc. 94: 1307–1309.Hamann, A. et al. 2015. Velocity of climate change algorithms for guiding conservation and management. – Global Change Biol. 21: 997–1004.Hargreaves, G. L. and Hargreaves, G. H. 1985. Irrigation water requirements for Senegal River basin. – J. Irrig. Drain Eng. 111: 265–275.Hasumi, H. and Emori, S. 2004. K‐1 coupled gcm (miroc) description. – Center for Climate System Research, Univ. of Tokyo, Tokyo.He, Q. et al. 2013. Integrative testing of how environments from the past to the present shape genetic structure across landscapes. – Evolution 67: 3386–3402.Hijmans, R. J. 2016. raster: geographic data analysis and modeling. – R package ver. 2.5‐8, < https://CRAN.R‐project.org/package = raster>.Hijmans, R. J. and Graham, C. H. 2006. The ability of climate envelope models to predict the effect of climate change on species distributions. – Global Change Biol. 12: 2272–2281.Hijmans, R. J. et al. 2005. Very high resolution interpolated climate surfaces for global land areas. – Int. J. Climatol. 25: 1965–1978.Hijmans, R. J. et al. 2016. dismo: species distribution modeling. – R package ver. 1.0‐15, < https://CRAN.R‐project.org/package = dismo>.Hof, A. R. et al. 2012. The usefulness of elevation as a predictor variable in species distribution modelling. – Ecol. Model. 246: 86–90.Howe, G. T. et al. 2003. From genotype to phenotype: unraveling the complexities of cold adaptation in forest trees. – Can. J. Bot. 81: 1247–1266.Katul, G. G. et al. 2012. Evapotranspiration: a process driving mass transport and energy exchange in the soil–plant–atmosphere–climate system. – Rev. Geophys. 50: 1–25.Kearney, M. et al. 2008. Modelling species distributions without using species distributions: the cane toad in Australia under current and future climates. – Ecography 31: 423–434.Knowles, L. L. and Alvarado‐Serrano, D. F. 2010. Exploring the population genetic consequences of the colonization process with spatio‐temporally explicit models: insights from coupled ecological, demographic and genetic models in montane grasshoppers. – Mol. Ecol. 19: 3727–3745.Kozak, K. H. and Wiens, J. J. 2016. What explains patterns of species richness? The relative importance of climatic‐niche evolution, morphological evolution, and ecological limits in salamanders. – Ecol. Evol. 6: 5940–5949.Kriticos, D. J. et al. 2011. CliMond: global high‐resolution historical and future scenario climate surfaces for bioclimatic modelling. – Methods Ecol. Evol. 3: 53–64.Lassueur, T. et al. 2006. Very high resolution digital elevation models: do they improve models of plant species distribution? – Ecol. Model. 198: 139–153.Lima‐Ribeiro, M. S. et al. 2015. Ecoclimate: a database of climate data from multiple models for past, present, and future for macroecologists and biogeographers. – Biodivers. Inform. 10: 1–21.Martínez‐Meyer, E. et al. 2006. Ecological niche modelling and prioritizing areas for species reintroductions. – Oryx 40: 411–418.McMaster, G. S. and Wilhelm, W. W. 1997. Growing degree‐days: one equation, two interpretations. – Agric. For. Meteorol. 87: 291–300.Merow, C. et al. 2013. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. – Ecography 36: 1058–1069.Metzger, M. J. et al. 2013. A high‐resolution bioclimate map of the world: a unifying framework for global biodiversity research and monitoring. – Global Ecol. Biogeogr. 22: 630–638.Moreno‐Amat, E. et al. 2015. Impact of model complexity on cross‐temporal transferability in Maxent species distribution models: an assessment using paleobotanical data. – Ecol. Model. 312: 308–317.Morin, X. and Thuiller, W. 2009. Comparing niche‐ and process‐based models to reduce prediction uncertainty in species range shifts under climate change. – Ecology 90: 1301–1313.Morin, X. et al. 2007. Process‐based modeling of species “distributions: what limits temperate tree species” range boundaries? – Ecology 88: 2280–2291.Muscarella, R. et al. 2014. ENMeval: an R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxentecological niche models. – Methods Ecol. Evol. 5: 1198–1205.Olson, D. M. et al. 2001. Terrestrial ecoregions of the world: a new map of life on earth. – BioScience 51: 933–938.Owens, H. L. et al. 2013. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. – Ecol. Model. 263: 10–18.Peterson, A. T. and Nakazawa, Y. 2008. Environmental data sets matter in ecological niche modelling: an example with Solenopsis invicta and Solenopsis richteri. – Global Ecol. Biogeogr. 17: 135–144.Peterson, A. T. et al. 2011. Ecological niches and geographic distributions. – Princeton Univ. Press.Phillips, S. J. et al. 2006. Maximum entropy modeling of species geographic distributions. – Ecol. Model. 190: 231–259.Pineda, E. and Lobo, J. M. 2009. Assessing the accuracy of species distribution models to predict amphibian species richness patterns. – J. Anim. Ecol. 78: 182–190.Pradervand, J. N. et al. 2014. Very high resolution environmental predictors in species distribution models: moving beyond topography? – Prog. Phys. Geogr. 38: 79–96.Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. – J. Biogeogr. 41: 629–643.Ramirez‐Villegas, J. and Jarvis, A. 2010. Downscaling global circulation model outputs: the Delta method decision and policy analysis working paper no. 1. – Cent. Trop. Agric. Colomb.Rivas‐Martínez, S. and Rivas‐Sáenz, S. 2009. Synoptical worldwide bioclimatic classification system. – < www.globalbioclimatics.org/ >, accessed 15 February 2016.Rodda, G. H. et al. 2011. Challenges in identifying sites climatically matched to the native ranges of animal invaders. – PLoS One 6: e14670.Rödder, D. et al. 2009. Alien invasive slider turtle in unpredicted habitat: a matter of niche shift or of predictors studied? – PLoS One 4: e7843.Saupe, E. E. et al. 2012. Variation in niche and distribution model performance: the need for a priori assessment of key causal factors. – Ecol. Model. 237–238: 11–22.Sayre, R. et al. 2009. A new map of standardized terrestrial ecosystems of the conterminous United States. – US Geological Survey Professional Paper 1768, Reston, VA.Schoener, T. W. 1968. The anolis lizards of Bimini: resource partitioning in a complex fauna. – Ecology 49: 704–726.Stanton, J. C. et al. 2012. Combining static and dynamic variables in species distribution models under climate change. – Methods Ecol. Evol. 3: 349–357.Stevens, B. et al. 2013. Atmospheric component of the MPI‐M Earth System Model: ECHAM6. – J. Adv. Model. Earth Syst. 5: 146–172.Svenning, J.‐C. et al. 2011. Applications of species distribution modeling to paleobiology. – Quat. Sci. Rev. 30: 2930–2947.Synes, N. W. and Osborne, P. E. 2011. Choice of predictor variables as a source of uncertainty in continental‐scale species distribution modelling under climate change. – Global Ecol. Biogeogr. 20: 904–914.Thornthwaite, C. W. 1948. An approach toward a rational classification of climate. – Geogr. Rev. 38: 55–94.Thuiller, W. 2004. Patterns and uncertainties of species’ range shifts under climate change. – Global Change Biol. 10: 2020–2027.Thuiller, W. et al. 2005. Niche-based modelling as a tool for predicting the risk of alien plant invasions at a global scale. – Global Change Biol. 11: 2234–2250.Title, P. O. and Burns, K. J. 2015. Rates of climatic niche evolution are correlated with species richness in a large and ecologically diverse radiation of songbirds. – Ecol. Lett. 18: 433–440.Title, P. O. and Bemmels, J. B. 2017. Data from: ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. – Deep Blue Data repository < http://dx.doi.org/doi:10.7302/Z2BR8Q40>.Tuanmu, M.‐N. and Jetz, W. 2015. A global, remote sensing‐based characterization of terrestrial habitat heterogeneity for biodiversity and ecosystem modelling. – Global Ecol. Biogeogr. 24: 1329–1339.Vörösmarty, C. J. et al. 2005. Geospatial indicators of emerging water stress: an application to Africa. – Ambio 34: 230–236.Waltari, E. et al. 2007. Locating Pleistocene refugia: comparing phylogeographic and ecological niche model predictions. – PLoS One 2: e563.Wang, T. et al. 2012. ClimateWNA – high‐resolution spatial climate data for western North America. – J. Appl. Meteorol. Climatol. 51: 16–29.Wang, T. et al. 2016. Locally downscaled and spatially customizable climate data for historical and future periods for North America. – PLoS One 11: e0156720.Warren, D. L. and Seifert, S. 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. – Ecol. Appl. 21: 335–342.Warren, D. L. et al. 2008. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. – Evolution 62: 2868–2883.Warren, D. L. et al. 2014. Incorporating model complexity and spatial sampling bias into ecological niche models of climate change risks faced by 90 California vertebrate species of concern. – Divers. Distrib. 20: 334–343.Wieczorek, J. et al. 2012. Darwin core: an evolving community‐developed biodiversity data standard. – PLoS One 7: e29715.Willmott, C. and Feddema, J. 1992. A more rational climatic moisture index. – Prof. Geogr. 44: 84–88.Wilson, A. M. and Jetz, W. 2016. Remotely sensed high‐resolution global cloud dynamics for predicting ecosystem and biodiversity distributions. – PLoS Biol. 14: e1002415–20.Wilson, M. F. J. et al. 2007. Multiscale terrain analysis of multibeam bathymetry data for habitat mapping on the continental slope. – Mar. Geodesy 30: 3–35.Wright, A. N. et al. 2014. Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change. – Divers. Distrib. 21: 111–122.Zomer, R. J. et al. 2006. Carbon, land and water: a global analysis of the hydrologic dimensions of climate change mitigation through afforestation/reforestation. – Colombo, Sri Lanka.Zomer, R. J. et al. 2008. Climate change mitigation: a spatial analysis of global land suitability for clean development mechanism afforestation and reforestation. – Agric. Ecosyst. Environ. 126: 67–80.Supplementary material (Appendix ECOG‐02880 at < www.ecography.org/appendix/ecog‐02880 >). Appendix 1.

Journal

EcographyWiley

Published: Jan 1, 2018

There are no references for this article.