Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
GEOMATICS, NATURAL HAZARDS AND RISK 2021, VOL. 12, NO. 1, 29–62 https://doi.org/10.1080/19475705.2020.1860139 Integrating multilayer perceptron neural nets with hybrid ensemble classifiers for deforestation probability assessment in Eastern India a a b,c,d Sunil Saha , Gopal Chandra Paul , Biswajeet Pradhan , d,e f Khairul Nizam Abdul Maulud and Abdullah M. Alamri a b Department of Geography, University of Gour Banga, Malda, West Bengal, India; Centre for Advanced Modeling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, New South Wales, Australia; Department of Energy and Mineral Resources Engineering, Sejong University, Seoul, Republic of Korea; Earth Observation Center, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia; Department of Civil Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia; Department of Geology and Geophysics, College of Science, King Saud University, Riyadh, Saudi Arabia ABSTRACT ARTICLE HISTORY Received 14 August 2020 The rapid expansion of human settlement, agricultural land and Accepted 1 December 2020 roads because of population growth in several regions of the world has contributed to the depletion of forest land. In this KEYWORDS study, novel ensemble intelligent approaches using bagging, Deforestation probability; dagging and rotation forest (RTF) as meta classifiers of multilayer hybrid ensemble perceptron (MLP) were used to predict spatial deforestation prob- techniques; machine ability (DP) in Gumani Basin, India. The success rate and correct- learning; GIS; remote ness of prediction of the ensemble models were compared with sensing; India MLP. A total of 1000 deforested pixels and 14 deforestation deter- mining factors (DDFs) were used. The ensemble models were trained using 70% of the deforested pixels and validated with the remaining 30%. DDFs were chosen by applying the information gain ratio and Relief-F test methods. Distance to settlement, population growth and distance to roads were the most import- ant factors. The results of DP modelling demonstrated that nearly 16.82%–12.64% of the basin had very high DP. All four models created DP maps with reasonable prediction accuracy and good- ness of fit, but the best map was produced by MLP-bagging. The accuracy of the MLP neural net model was increased 2-3% after ensemble with the hybrid meta classifiers (RTF, bagging and dagging). The proposed method could be used for deforestation prediction in other areas having similar geo-environmental condi- tions. Furthermore, the findings might be used as a basis for future research and could help planners in forest management. CONTACT Biswajeet Pradhan Biswajeet.Pradhan@uts.edu.au; email@example.com Supplemental data for this article is available online at https://doi.org/10.1080/19475705.2020.1860139. 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 30 S. SAHA ET AL. 1. Introduction Deforestation is a quasi-natural phenomenon occurring on our planet’s surface (Wan Mohd Jaafar et al. 2020). Worldwide, forests are affected by several threats, including population increase in urban areas, expansion of farming land and amenities, illegal mining and unregulated property rights (Gaveau et al. 2009; Newman et al. 2014; Robinson et al. 2014). The conservation of biodiversity and the removal of substantial carbon sink may help reduce carbon dioxide concentrations (Buchanan et al. 2008; Wang et al. 2009). Climate change, ambient carbon cycle imbalance and ecosystem degradation are the main environmental threats correlated with deforestation. Deforestation is considered as one of the most remarkable aspects of modifications in land use/land cover. Forest is a vital natural resource that provides a large range of ecological goods and facilities and plays a crucial role in balancing the atmospheric condition and, thus, climate change; therefore, forest cover change has become a glo- bal concern (Kumar et al. 2014). The effects of the growing strain on the environ- ment have culminated in habitat destruction, deforestation and depletion for biodiversity (Nandy et al. 2007; Sun and Southworth 2013). Furthermore, the increased rate of soil erosion due to loss of forest cover may increase the environ- mental risks, such as landslide, water pollution and degradation of wetland ecosystem, which may have a major detrimental effect on the well-being of humans on a large scale (Glade 2003; Wahab et al. 2019). Thus, identifying the underlying forces behind forest cover modification is crucial for recognising the transformation in our planet- ary ecosystem and reducing the speculation regarding spatial and temporal deforest- ation probability (DP) (Bax et al. 2016). The deforestation process occurs in a haphazard fashion. On the basis of a set of suitable and desirable characteristics of physical and anthropogenic factors, forested lands are converted into other land use. For instance, forest patches near roads may have a high chance of being deforested. Similarly, low-elevation and gentle slope areas are favourable for cultivation, and farmland than rough terrain (Lambin et al. 2001; Turner et al. 2001). Understanding the causes of deforestation is, therefore, important in the formulation of effective mitigation steps and policies (Hosonuma et al. 2012). Causes of deforestation and the severity of their effects differ considerably from one region to another region and change over time. Most causes have been described as leading to rather than acceler- ating deforestation (Geist et al. 2002). Some deforestation research has focused on anthropogenic forces, although the analysis of deforestation processes requires consid- ering the natural and anthropogenic aspects of the ecosystem (Bax et al. 2016; Wan Mohd Jaafar et al. 2020). Traditional approaches used for analysing deforestation suffer from a series of limita- tions, such as follows: 1. correlation cannot be regarded as a clear indicator of the source; 2. statistical models selected for prediction may have minimal explanatory importance; 3. relationships can be nonlinear. With recent advances in remote sensing (RS), geographic information systems (GIS) and various statistical techniques, spatial DP can be forecasted more accurately (Pontius and Schneider 2001; Houet and Hubert-Moy 2006;Arekhi 2011;Hamzah et al. 2020;Saad et al. 2020). In the Carpathian Mountains, the increasing accessibility to large temporal satellite imagery and the development of GIS and RS tools have facilitated the comprehensive study of past human-induced forest depletion. Many GEOMATICS, NATURAL HAZARDS AND RISK 31 areas have also been studied at national (Munteanu et al. 2014, 2015) and international scales (Sobala et al. 2017;Kaim et al. 2018; Szymura et al. 2018; Wan Mohd Jaafar et al. 2020). Several scholars have prepared DP models based on logistic regression algorithms in tropical areas (Kumar et al. 2014;Bavaghar 2015; Kucsicsa and Dumitrica 2019). Traditional unsupervised techniques, including regression analysis (Ludeke et al. 1990), change vector analysis (Nackaerts et al. 2005) and principal component analysis (Deng et al. 2008;Ortegaet al. 2020), have been widely used to detect changes in forest cover. Artificial intelligence (AI) and machine learning (ML) algorithms have been widely adopted for mapping different hazards and potentiality such as gully erosion susceptibil- ity, landslide susceptibility (Roy et al. 2019), flood susceptibility (Khosravi et al. 2018), land subsidence (Tien Bui et al. 2018); Individual tree crown detection and delineation (Wan Mohd Jaafar et al. 2018) and groundwater potentiality mapping (Tien Bui et al. 2019). In all those cases, ML and AI methods have shown good capability in modelling hazards. ML techniques have been used for the prediction of deforestation. Ortega et al. (2020) used the deep learning technique and support vector machine to detect deforest- ation. Saha et al. (2020) used random forest and reduced error pruning trees (REPTree) formodelling theDP. Dlamini(2016), Kru €ger and Lakes (2015) and Mayfield et al. (2017) used Bayesian networks for assessing DP, which provided reasonable results. In recent years, several authors have used hybrid ensemble methods for mapping landslides (Fang et al. 2020), gully erosion (Roy et al. 2020) and groundwater potentiality (Rahmati et al. 2018) and these techniques have achieved better results than individual models. Ensemble method is a learning in which several models, such as classifiers, are systematically produced and integrated to solve a specific computational intelligence problem. Ensemble method is mainly used to enhance a model’s efficiency (classification, estimation, etc.) or minimize the possibility of an unexpected selection of a weak one. The ensemble of hybrid meta classifier and artificial neural network is still not used in the field of deforestation modelling. On the basis of the accuracy of the hybrid ensemble models used in the above- mentioned fields, the current work addressed the question that hybrid ensemble methods are equally accurate for DP modelling or not. We selected ensembles of multilayer perceptron neural nets (MLPnn) and three hybrid ensemble models, i.e. MLP-bagging, MLP-dagging and MLP-rotation forest (RTF), to prepare DP maps of the study area. The novelty of this work is that the employed hybrid ensembles of MLPnn and bagging, dagging and RTF models have not been used for deforestation modelling. This work not only included these methods but also used Friedman and Wilcoxon signed-rank tests for judging the difference among the DP maps produced by these models, which are also relatively new in this field. Information about the forest cover changes of this area remains limited. In this situation, RS is a vital source of data for the effective monitoring of this region. The forest cover changes were demarcated using the normalised difference vegetation index (NDVI). The DP maps would help the researchers and decision makers of this region. In addition, these sorts of meth- ods have not yet been used in this area, as well as in India for the evaluation of DP. The detailed explanation of all of these methods and parameters would direct future researchers working in this field. 32 S. SAHA ET AL. The purpose of this research is to evaluate the DP in the Gumani River Basin, India by applying the hybrid ensemble frameworks of MLPnn and ensemble strat- egies, i.e. bagging, dagging and RTF. Preparation of the probability map for deforest- ation is helpful to policymakers for identifying the areas susceptible to deforestation and evaluating the current forest management. 2. Description of the study area The Gumani River is located in the fringe area of the Chhota Nagpur Plateau of India. It is the tributary of the Ganga River having a length of 120.09 km. Geographically, 0 00 0 00 0 00 0 00 this basin extends from 24 37 39 N–25 7 19 N lat. and 87 21 20 E–87 54 20 E lon. (Figure 1), encompassing an area of 1274.57 km . The forested area has been decreased from 24.11% (1990) to 14.33% (2020) of the total area of the basin (Landsat TM 1990 and OLI 2020 images of the USGS Earth Explorer). The lower part of the basin is agricul- turally prosperous, whilst the upper part has a high concentration of population and settlement. Population growth is high in this study area; the total population was 560,000 in 1991 and increased to 750,000 in 2011 (Census of India 1991, 2001). Therefore, population increase has a detrimental effect on the forest cover, whilst atten- tion should be given to geographical context and other criteria of forest depletion. Geologically, this area comprises Rajmahal Traps, lower Vindhya system, lower Gondwana system and new alluvium. This basin often has different geomorphological nature because the upper portion belongs to the undulating plateau and the lower por- tion is a plain area. The elevation of the study area ranges from 17 m to 581 m from the mean sea level. The climate varies from subtropical humid to subhumid (Chandniha et al. 2017). Rainfall in this basin mainly occurs between June and September (Chandniha et al. 2017). The mean annual rainfall is 1,300 mm (Chandniha et al. 2017). According to the National Bureau of Soil Survey and Land Use, the prevalent soils are fine loamy, loamy skeleton and clay skeleton. The forest concentration is mainly high in the upper portion of the basin and low in the lower portion. For protecting the remain- ing forest areas in the basin, prediction of deforestation area and formulation of suitable strategies by the local government are necessary. Our work would help the decision mak- ers in this respect. 3. Background theory of methods employed 3.1. Ensemble model for DP assessment DP models using ensemble structures of MLPnn and bagging, dagging and RTF for spatial DP were obtained through four key stages (Figure 2). 1. Selection of deforestation determining factors (DDFs): After the survey of the pub- lished literature, the DDFs were selected. The selected parameters were justified using two statistical methods, i.e. information gain ratio (IGR) and Relief-F. Deforestation affecting factors were divided into two classes, namely, natural fac- tors (viz. altitude, slope, forest density, distance to forest edge, proximity to river, aspect and topographic position index, [TPI]) and anthropogenic factors (viz. GEOMATICS, NATURAL HAZARDS AND RISK 33 Figure 1. Location of the Gumani River Basin with training and testing deforestation sample points. 34 S. SAHA ET AL. Figure 2. Methodological flow chart of ML ensemble framework for DP analysis. population density, agricultural land density, distance from agricultural land, proximity to road, settlement density, proximity to settlement and population growth) in the DP analysis. 2. Collection and preparation of data layers: Data regarding deforested locations and DDFs were collected to predict spatial DP. In January 2020, an intensive field investigation with a handheld global positioning system was conducted to valid- ate the deforested locations collected through the interpretation of Google Earth images and NDVI. 3. Assessment of the contribution of the DDFs: A frequency ratio (FR) model was used, and the percentage shear of the sample deforestation points was calculated for judging the significance of the DDFs. 4. Preparation of deforestation models and DP maps: To construct deforestation models, ensemble methods were firstly, implemented to refine the training data set. Input configured data were then utilised to categorise the groups for the probability of spatial deforestation by using the MLPnn base classifier. Finally, frameworks of ML ensemble were built for DP models. 5. Validation and comparison of models: Using the ROC, efficiency, accuracy, MAE and RMSE DP maps were validated and compared in consideration of the training and testing datasets. Friedman and Wilcoxon statistical signed-rank tests were per- formed to check whether differences exist amongst the DP models or not. 3.2. Data used 3.2.1. Deforestation map The forest cover change (1990–2020) was considered a dependent variable (Figure 3) for DP modelling. NDVI was measured from the Landsat images of 30 m 30 m GEOMATICS, NATURAL HAZARDS AND RISK 35 Figure 3. Depicting the forest cover change during 1990–2020 (dependent variable) used for training the ML ensemble model: a. 1990, b. 2000, c. 2010, d. 2020, and e. deforestation map (1990–2020). 36 S. SAHA ET AL. resolution for 1990 (Figure 3a), 2000 (Figure 3b), 2010 (Figure 3c) and 2020 (Figure 3d) via GIS tools, and NDVI values greater than 0.3 were considered forest (Weier and Herring 2000). During these decades, nearly 9% of forest cover was lost. The for- est cover areas are 24.11%, 20.96%, 16.56% and 14.33% of the total basin area for the years of 1990 (3a), 2000 (3b), 2010 (3c) and 2020 (3d), respectively. NDVI map of 1990 of the study area was considered as the base map for this study. A binary map with the groups of ‘deforestation’ and ‘non–deforestation’ was produced by subtract- ing the forest cover from 1990 to 2020 (Figure 3e) for the duration of 1990–2020. For preparing the DP models and obtaining enhanced result, 1000 pixels for both classes, i.e. deforested and non–deforested, were randomly selected from the total deforestation and non-deforestation pixels (Suzen € and Doyuran 2004). Amongst them, 70% were considered for modelling, and 30% were selected for validating the models. 3.2.2. Preparation of DDFs For constructing the DP models, seven natural factors (i.e. altitude, slope, forest dens- ity, distance from forest edge, proximity to river, aspect and TPI) and seven anthropogenic factors (i.e. density of population and agricultural land, distance from agricultural land, proximity to road, settlement density, proximity to settlement and population growth rate) were selected (Table 1). These factors were considered as independent factors, and a thematic layer for each variable was prepared. In Table 1, methods of preparing the factors and sources of data have been presented. The regional topography condition plays an important role in the forest cover change. Spatial variation in the deforestation process is influenced by slope, altitude, aspect and TPI (Bax et al. 2016; Szymura et al. 2018). The slope classes determine the spatial variability in deforestation process (Siles 2009; Kumar et al. 2014; Bavaghar 2015; Vanonckelen and van Rompaey 2015; Bax et al. 2016; Szymura et al. 2018). A slope map (Figure 4a) was extracted from ASTER DEM with a resolution of 30 m 30 m (Table 1). Aspect (Figure 4j) controls the amount of sunlight and rainfall of a particular region (Kumar et al. 2014; Bavaghar 2015; Bax et al. 2016). It affects the composition and development of forest cover. The degree of deforestation is also indirectly connected to slope face (Bayat 2000). The DEM of the basin was considered the altitude map (Figure 4k). In high-altitude areas, natural hazards, such as weather- ing, aeolian flooding and landslide, are the main drivers of deforestation; in low-alti- tude areas, deforestation is induced mostly by anthropogenic activities (Ercanoglu and Gokceoglu 2002). Distance to the river is a parameter that determines the stabil- ity and instability of slope, indirectly influencing the forest cover change (Saha et al. 2002; Yalcin 2008). Waterbodies may be exposed to forested areas and reflect second- ary routes for timber collection (Nackaerts et al. 2005). For distance to river, a the- matic layer was prepared in a GIS environment by using the Euclidean distance buffer tool (Figure 4c). The distance from the margins of forest is an important factor that can regulate deforestation (Matlack 1994). This factor is an intermediate area from which forest destruction continues at the border of existing forest (Arekhi 2011; Kumar et al. 2014). DP is determined using the nature and fea- tures of forest edge in the core forest region. This thematic layer was also produced GEOMATICS, NATURAL HAZARDS AND RISK 37 Table 1. Factors and their sources and methods of calculation used in this research. Factors Methods used Data used Source Elevation 30 m 30 m digital elevation model ASTER DEM (30 30 m) U.S Geological Survey Ni Slope Tan h ¼ N¼ No. of Contour Cutting; ASTER DEM (30 30 m) U.S Geological Survey 636:6 i¼ Contour Interval hi Aspect dz ASTER DEM (30 30 m) U.S Geological Survey Aspect ¼ 57:29 a tan 2ð ½ dz=dx dy Where, dz/dx¼ ((cþ 2fþ i)-(aþ 2dþ g))/8 dz/dy¼((gþ 2hþ i)-(aþ 2bþ c))/8 Here, a to i indicates the cell value of 3 3 window. Proximity to river Euclidean distance Topographical sheets Survey of India TPI ASTER DEM (30 30 m) U.S Geological Survey TPI ¼ Z Z ¼ Z n i2R Z R Z ¼ the central point and Z¼ average elevation Forest density F/A Landsat images (30 30 m) U.S Geological Survey F¼ Forest area; A¼ total area Distance from forest edge Euclidean distance buffering Landsat images (30 30 m) U.S Geological Survey Proximity to settlement Euclidean distance buffering Google earth image Google earth pro ðPP Þ Population growth 100 Census data Census of India PR ¼ P¼Total population after time t; P ¼Starting population; t¼Time Proximity to road Euclidean distance buffering Google earth image Google earth pro Distance from agricultural land Euclidean distance buffering Landsat images (30 30 m) U.S Geological Survey Settlement density S/A Google earth image Google earth pro S¼ Settlement area; A¼ total area Agricultural land density AL/A Landsat images (30 30 m) Google earth pro AL¼ Area of Agricultural Land; A¼ total area NIRRED NDVI NDVI ¼ Landsat Images (30 30 m) U.S Geological Survey NIRþRED NIR¼ reflection in the near-infrared spectrum; RED¼ reflection in the red range of the spectrum 38 S. SAHA ET AL. Figure 4. Independent variables used for modelling the deforestation probability of the study area: a) slope, b) forest density, c) proximity to river, d) distance to agricultural land, e) proximity to road, f) proximity to forest edge, g) population density, h) proximity to settlement, i) settlement density, j) slope aspect, k) altitude, l) agricultural land density, m) population growth and n) TPI. GEOMATICS, NATURAL HAZARDS AND RISK 39 Figure 4. Continued. using the Euclidean distance buffer tool (Figure 4f). An inverse relationship exists between forest density and DP (Bouldin 2008). A forest density map was prepared by dividing the forested area by total area based on the forest map of 2020 (Figure 4b). 40 S. SAHA ET AL. Figure 4. Continued. Topographic Position Index (TPI) compares the elevation of each cell in a DEM to the mean elevation of a specified neighborhood around that cell. TPI classes affect the spatial variability in the deforestation process (Wilson et al. 2005; Siles 2009; Kumar et al. 2014; Bavaghar 2015; Vanonckelen and van Rompaey 2015; Bax et al. 2016; Szymura et al. 2018). TPI was created on the basis of DEM and applied for extracting the slope position classes (Jennes 2006). According to Weiss (2001), TPI was classified into six categories in this study area (Figure 4n), namely, 1) ridge (TPI > 1SD); 2) upper slope (0.5SD < TPI 1SD); 3) middle slope (0.5SD < TPI < 0.5SD, slope > 5 ); 4) lower slope (1SD < TPI 0.5SD); 5) flat (0.5SD < TPI < 0.5SD, slope 5 ); 6) valley (TPI1SD). Different sociocultural and economic practices are mainly responsible for the deg- radation and loss of forest (Boudreau et al. 2005). The potentiality of deforestation is multiplied as the population continues to grow near a forested area (Vanonckelen and van Rompaey 2015; Szymura et al. 2018). As a result, population growth (Figure 4m), population density (Figure 4g), distance to settlement (Figure 4h) and settlement density (Figure 4i) are the main reasons for deforestation. A reciprocal relationship exists between forest cover change and settlement density. As settlement density (Figure 4i) increases, the probability of deforestation in its neighbouring parts will be increased and vice versa. The installation of road systems across land cover proceeds to divide the forest land and is the first move towards forest depletion. The road net- work is a vital deforestation-triggering factor because the forest close to the road is highly prone to degradation and vice versa (Chomitz and Gray 1999). The chances of deforestation are high in accessible areas (Bavaghar 2015). Here, a distance-to-road map was produced using the Euclidean distance buffer tool (Figure 4e). Rapid popu- lation increase is the main cause of deforestation (Michalski et al. 2008). Much inhab- itants need substantial food and house and, hence, considerable land for farmland and houses (Cropper and Griffiths 1994). Overpopulation is considered the major cause of forest destruction in accordance with international organisations, including FAO. The population density map of the study area was constructed on the basis of GEOMATICS, NATURAL HAZARDS AND RISK 41 data from the 2011 census (Figure 4g). Agricultural land density (Figure 4l)is an important factor for assessing the DP of a particular region because it identifies the concentration of agricultural land in a particular area. The chances of deforestation are high where the density of agricultural land is high. The distance to agricultural land (Figure 4d) is also an important land use predictor for determining DP. The chances of deforestation will be increased as the distance decreases and vice versa because a high probability of building or other human land usage will occur near an agricultural field. Population growth can be followed by a high rate of forest cover change (Vanonckelen and van Rompaey 2015; Szymura et al. 2018). The population growth (Figure 4m) data were collected from the Census of India (2011). High rates of population growth lead to the increase in settlement and agricultural area in the area of forest cover (Minetos and Polyzos 2010). 3.3. Factor selection The selection of conditioning variables is a challenging task in any study because no specific criteria are available. Tien Bui et al. (2016) and Roy and Saha (2020) identi- fied effective factors by using statistical models for natural hazard assessment. Gayen and Saha (2018) used multicollinearity analysis for selecting DDFs. Different statis- tical methods, such as correlations, regressions, Relief-F tests, IGR, probabilistic mod- els and ML models, can also be used to select DDFs. In this study, the IGR and Relief-F methods were applied for selecting the important deforestation determining factors. IGR solves the weakness of information gain related to attributes that can take on a vast range of different values that could learn the set of training too well. IGR has been used to assess which of the factors are perhaps the most significant. Relief-F algorithms have often been used as a form of selecting features that is imple- mented in a pre-possessing period well before the model is trained and is one of the most powerful pre-processing algorithms. 3.3.1. Information gain ratio (IGR) For DP, anthropogenic and natural factors do not have the same diagnostic power and may even reduce the predictive capacity of a model. If we remove the irrelevant DDFs from the model, enhanced findings and prediction can be obtained (Mart ınez- Alvarez et al. 2013). IGR is amongst the most effective factor selection strategies (Tien Bui et al. 2016). Information is gained on the basis of an intelligent principle that helps reduce variance and shows the importance of influencing variables. In data mining, IGR is an important strategy for quantifying factor predictability (Witten and Tibshirani 2011). Quinlan (1993) established the IGR, in which a high ratio means a great predictive capacity. In the supplementary material section, equations used to calculate IGR are mentioned (S1). In this study for identifying as well as selecting the important DDFs IGR was used. 3.3.2. Relief-F test method The Relief-F method, implemented by Kira and Rendell (1992), iteratively changes the weights of features in accordance with their capacity to distinguish between 42 S. SAHA ET AL. adjacent shapes. The principal concept of the Relief-F algorithm is similar to the spe- cific rules of the k-nearest neighbour algorithm (Altun et al. 2007). Being in the same class is likely to yield a distance close to a given distance. If the attribute is useful, the closest distances of the same class are expected to be closer to the range given throughout this attribute than the closest distances of all other classes (Altun et al. 2007). Mathematically, X is assumed to be a randomly drawn sample of the outcomes of a binary test. Two closest neighbours, one from the same class (strike or NH) and the other from another class (miss or NM) should be evaluated. Then, the weight (wi) for the i-th feature is updated via a heuristic computation (Cai and Ng 2012). i ðiÞ i ðiÞ w w þ x NH x NM : (1) i i Further information on the algorithm is provided in the paper of Liu and Motoda (2008). 3.4. Deforestation occurrence in relation to DDFs and analysis of its influence The percentage of deforestation samples and the FR of subclasses of each factor were calculated to understand the influences of the selected DDFs on the deforestation process. The percentage of deforested sample in subclasses of each explaining variable was calculated by overlaying each raster representing independent variables with the randomly selected deforestation pixels. FR provides a proportion of deforestation pix- els in a specific category for each input layer (Lee and Pradhan 2007). FR values (Equation 2) based on the frequency of deforestation samples were calculated using the following equation: f =tf FR ¼ , (2) x=tx where, f refers to the pixels of deforestation in the explanatory variable subclass, tf indicates the total deforestation pixels, x denotes the total pixels in the explanatory variable subclass, and tx is the total number of pixels. 3.5. Base classifier of MLPnn MLPnns are regarded as the techniques of artificial neural networks (ANN) and are commonly utilised in classification (Haykin 2009). MLPnn is a feedforward neural net- work and for the training process, it uses backpropagation. No decision has been reached about the relative values of individual input variables, the plurality of inputs is set on the basis of weight adjustment throughout the training phase, and the distribution of the training data set is independent of the pre-assumptions in these techniques (Gardner and Dorling 1998). Three main sequences exist for creating the neural networks in MLP, i.e. input, hidden and output layers (Figure 5). In accordance with a specific application, every layer in a network contains adequate neurons. The input layer is inactive and rarely gathers data (e.g. data from various DDFs). Hidden and output layers analyse information on a constant basis. Input layers are known as variables influencing GEOMATICS, NATURAL HAZARDS AND RISK 43 Figure 5. Theoretical structure of MLPnn model. deforestation, output layers are regarded as the graded outcomes of inferring deforested or non-deforested groups, and hidden layers are the categorising layers for converting inputs into outputs. MLP Neural Nets have shown to be performing better than conven- tional classification methods (Benediktsson et al. 1990). There are some benefits of using this approach: (1) there are no pre-assumptions as to the distribution of the training dataset, (ii) there is no need to decide on the relative importance of the various input measures, and (iii) the weights are changed to choose the most input measures during the training process (Gardner and Dorling 1998). MLPnns are of two key phases: (I) inputs are transmitted via the hidden layers to the output values, then the output values are compared with the pre-values to approximate the differentiation; (II) in achieving the best performance, weights are balanced to eliminate the disparity. Let x¼ xi, i¼ 1, 2,.., 14 is the vector of the 14 fac- tors impacting deforestation, and y¼ 1 (deforested) or 0 (non-deforested). The num- ber of neurons in the input and output layers is generally calculated via operation. The number of hidden layers and their neurons is quantified by trial and error (Gong 2009). For a classification question, MLPnn data processing includes three stages: learning, weighting, and classification stages. The learning phase happens with the issuance of random initial relational weights, which are continuously revised until the correct training efficiency is achieved. Subsequently, the modified weights derived from the prepared network are often used to process test data and assess the overall precision and effectiveness of the application. The network efficiency is assessed by evaluating the consistency of training and test data in terms of the percentage and overall accuracy of classification (Congalton 1991). Learning information from the input neurons is considered to acquire the information of the output neurons by using the hidden neurons. Neuron obtained from neuron in its corresponding j i input layer in the first hidden layer can be represented as: x ¼ w p , (3) ij i i¼1 where w reflects the weight of the association between input neuron i and hidden ij neuron j, pi is the data at input neuron I, and t is the input neuron number. The 44 S. SAHA ET AL. output value generated in the concealed neuron j, p , is the transfer function, f, which is evaluated as the amount provided in neuron j, x’. f, the transfer function, can be described as p ¼ fðxÞ¼ : (4) 1þ e Function f is typically a nonlinear sigmoid feature that is implemented to the weighted sum of input data until the data are transferred to the next stage. The sum of the squared differences between the expected and actual output neurons E values is defined as follows (Subasi 2007): E ¼ ðY YÞ (5) dj j where Y is the expected output neuron j and Y is the actual output neuron. dj j Each w weight is adjusted to lessen the value E based on the training algorithm ji used. In thisstudy, MLPnn wasfitted with 500 epochs, 1 hidden layersand validation threshold of 20 generated from the trial-and-error process to avoid overfitting cases. 3.6. ML ensemble techniques 3.6.1. RTF RTF is an ensemble approach assembled with individual decision trees (Kuncheva and Rodr ıguez 2007) and initially proposed for classification by Rodriguez et al. (2006). It is based on the concept of a random forest approach aimed at creating reli- able and flexible classifiers (Rodriguez et al. 2006). An individual tree is configured inside the RTF with compressed data sets associated with the space rotated using Principal Component Analysis (PCA). In this model, bootstrap samples are used as a training set for specific classifiers (Kuncheva and Rodr ıguez 2007). Throughout this process, points are derived from training datasets using base classifier to generate learning sub-training datasets (Pham, Bui et al. 2016). The function of DDFs in this analysis is x¼ðx , x , ::::::::x Þ: Y¼ðy , y Þ denotes the main vector divisions, defor- 1 2 1 2 ested or not deforested. D stands for the training data. F , F , :::F are categorized in 1 2 n accordance with the ensemble. T specifies a certain set of DDFs and is divided into sub-classes k. A new training nonempty subset X is prepared by applying the boot- ij th strap method where F is the j subset of features to run classifier D . Further, a lin- ij i ear transformation is used to X to prepare coefficients of matrix C wherein size of ij ij ð1Þ ðkÞ each matrix of X is M 1 with the coefficients of r … .r : Ensemble RTF is estab- ij ij ij lished on the basis of the rotation matrix formed using the basic methods of charac- terisation and conversion (Xia et al. 2014). The rotation matrix is obtained by rearranging Ri matrix. GEOMATICS, NATURAL HAZARDS AND RISK 45 2 3 ð1Þ ð2Þ ðS1Þ ri , ri , ::::ri 0 ::::: 0 1 1 1 6 7 ð1Þ ð2Þ ðS2Þ 6 7 0ri , ri , ::::ri ::::: 0 1 1 1 6 7 R ¼ : (6) 6 . . . 7 . . . 4 5 . 0 . . ð1Þ ð2Þ ðSkÞ 00 ::::: ri , ri , ::::ri 1 1 1 In this matrix, columns of R are reorganized as per original feature and a novel i i reorganized rotation matrix is called as R wherein xR signify the altered training set r r for classifier Di and all classifiers are to be run in a similar method.The obtained coefficients that are created for each entity class are organised using a sparse rotation matrix called R via the average mixture strategy. ðxÞ l ¼ d ðx Þ,j ¼ 1, 2::::::::::c, (7) ij j i i¼1 ðxÞ where l is the chief confidence allocated to the class of y , the likelihood allocated by the classifier Di and the regression dij is d ðx Þ: In this hypothesis, x is from class ij y , and c is the number of classes (Rodriguez et al. 2006). 3.6.2. Dagging Dagging is a well-known re-sampling ensemble approach that produces and integrates a number of classifiers utilizing the same learning algorithm for base-classifiers. Ting and Witten proposed dagging in 1997. The procedure varies in many respects from the pro- cess of boosting and bagging. For example, based on the outcome of the previously gen- erated classifiers, the boosting technique adapts the training data set in terms of distribution, while bagging modifies it stochastically and boosts the basis of the success of each classifier as a voting weight. For multiple disjoint experiments, dagging is used as a replacement for bootstrap experiments to obtain base classifiers (Ting and Witten 1997; Kotsianti and Kanellopoulos 2007). Furthermore, strong empirical indications pre- vail that dagging in noisy settings is far more resilient than boosting. A resampling ensemble strategy is used to merge multiple classifiers for ensuring improved predictive performance of base classifiers dependent on majority voting (Kotsianti and Kanellopoulos 2007). For this purpose, we created an ensemble in this research using dagging ensembles with MLPnn base classifier through voting methodology. 3.6.3. Bagging Bagging, designed by Breiman (1996), combines several cases of training dataset and uses bootstrap aggregation technique to achieve results of strong predictive precision centered on a based classifier (Wu et al. 2020). It was used to provide a precise map- ping of DP. For very large ensembles, bagging gives great results; having a greater number of estimators results in increasing the accuracy of these approaches in com- parison to RTF model. Such ensemble is chosen because a slight change in the train- ing data represents and enhances the capacity for estimation (Wu et al. 2020). Random selection of bootstrap samples to create a range of training subsets, gener- ation of classifiers of several models, and combining the classifier development in the 46 S. SAHA ET AL. final model are the three main steps in bagging (Tien Bui et al. 2016). In bootstrap experiments, one third of instances are not exterminated in the early test process. Bagging classifier in the bagging system uses the displacement approach to produce a bootstrap sample from the actual training dataset. The bagging hybrid ensemble solu- tion enhances the success to each array of classifiers by linking them to the original feature scheme for the bagging categorisation phase. These cases were recognised by Breiman (1996) as off-bag tests. A Bagging fits each base classifier on random subsets of the initial dataset and then aggregates their individual predictions to form a final prediction (either by voting or by averaging). 3.7. Construction of models and DP maps DP models utilising hybrid ML ensemble frameworks were developed using training data sets to predict the deforestation in the study area. For running the ML models continuous and categorical factors were used. The continuous DDF were classified based on the natural break classification method for the frequency ratio model as to know the influence of the sub-categories of the DDF through FR model. Deforested and forested pixels were considered as the training datasets. Pixels (70%) from both classes were randomly set as training datasets for running the models. The deforest- ation and non-deforestation were characterised as 0 and 1 codes, respectively. Once all the four models were effectively run in the training phase, the relational weights of the models were applied to compute the DP indices for all pixels. The measuring variables were standardised by training via the trial-and-error method to construct such DP models. Generally, 1 to 2 hidden layers are enough for pixel based mapping. For modelling the DP in this study using ensemble models ArcGIS and R-studio were used. Caret, rpart, ipred, rotationForest, neuralnet packages of R studio were used for predicting the deforestation probability. In this analysis, we used 1 hidden layers, 0.3 learning rate, 0.2 momentum, 0 seed, 500 training times and 20 validation thresholds for the MLPnn to: decide the quantity of data for reduced-error pruning, upgrade weight, add value to the weight, divide the data, and build the ensemble and finish the calibration testing (Pham et al. 2017; Onan 2016). The validation threshold is the value being used by validation test to be terminated. A threshold function is a Boolean function which determines whether a certain threshold is crossed by the value equality of its inputs. The percentage bag size indicates the training range size (Sedano et al. 2013). Likewise, 16 iterations, 1 seed, 100% of bag size (training range size) and MLPnn as base classifiers were set for bagging. Eighteen iterations, 2 seeds and MLPnn as base classifiers and 8 iterations, 1 seed and principal component ana- lysis as base filters were used. 3.8. Validation techniques 3.8.1. Threshold-dependent methods ROC curve remains the most effective and acceptable approach that can effectively test models (Kumar and Indrayan 2011). In this study, three threshold dependent methods i.e. ROC, precision and accuracy were used for effectively evaluate the GEOMATICS, NATURAL HAZARDS AND RISK 47 performance of the used models. The area under the curve (AUC) indicates the effectiveness and consistency of the models (Pepe 2000). The ROC curve has been used in various disciplines and branches (e.g. engineering and medical). Accuracy and precision have been considered for checking the robustness of models. Equations of AUC, sensitivity, specificity, precision and accuracy are mentioned in the supple- mentary material section (S2). High values of AUC, precision and accuracy indicate the good capability of models. AUC values vary from 0 to 1; an AUC value is highest with 1 which suggests a perfect estimation, whereas an AUC value < 0.5 implies poor results (Can et al. 2005). 3.8.2. Statistical techniques Statistical evaluation techniques, such as MAE and RMSE, were selected for this study to validate the models. MAE is the sum of difference between predicted and actual DP values of the datasets. RMSE is defined by the square root of MAE (Supplementary material-S3). Can et al. (2005) set a cut-off value of 0.5. A value above 0.5 suggests poor results, whereas a value less than 0.5 suggests good performance. 3.8.3. Friedman and wilcoxon statistical signed-rank tests The focus of this sub-subsection was to review the results of ensemble ML classifiers via statistical tests on multiple data sets. The classifiers of ML ensembles were tested using the same random samples. The main objective of these tests was to determine which of used methods vary statistically in performance. In this respect, Friedman and Wilcoxon rank tests are suitable because these tests do not presume homogeneity of regular distributions or variance (Tien Bui et al. 2016). The signed-rank tests of Friedman (1937) and Wilcoxon (1945) were applied in this work to analyse the major differences amongst model outputs. A decision was obtained in consideration of the likelihood of hypotheses (p-value); if the p-value is valid, then a considerable gap exists amongst the models and vice versa (Tien Bui et al. 2016). The signed-rank Wilcoxon determines the statistical importance of the systematic pairwise variations amongst the DP models. For this test, p-value and z- value were considered to determine the important variations amongst the models. If the p-value is smaller than 0.05 and the z-value reaches the threshold z values (1.96 and þ1.96), then the null alternative hypothesis will be accepted and the results of the DP models will be substantially different (Tien Bui, Pham et al. 2016). 4. Results 4.1. Relief-F test and IGR The IGR and Relief-F approaches were used to examine the relative importance of each of the DDFs for modelling DP probability. IGR and Relief-F were calculated for the training data, as shown in Figure 6 and Table 2. The resulting IGR and Relief-F indicated that selected variables have good predictive capability. Distance from settle- ment shown the maximum prediction capability; the IGR and Relief-F values were 48 S. SAHA ET AL. Figure 6. Contribution of DDFs in making the area potential for deforestation calculated using IGR and Relief-F. 0.3100 and 0.0922, respectively. Aspect contributed the least predictive value with IGR and Relief-F values of 0.0023 and 0.0052, respectively. 4.2. Frequency of deforestation in relation to DDFs The selected input factors led to a spatial heterogeneity in deforestation process across the study area. The percentage of deforestation samples and FR value in each subclass of DDFs was calculated to understand the influences of DDFs. The histo- grams (Figure 7) depict the relationship of deforestation with the different DDFs. For each slope class, deforestation varied (Figure 7a). The maximum deforested samples were identified in the low-slope class (56.7%), followed by those in the mod- erate-slope class. Similarly, the FR value was highest in the low-slope class, i.e. 1.08. The relationship between deforestation occurrence and aspect was also analysed (Figure 7b). The percentage of deforested samples and FR value (Table 3) were max- imum for the flat area. For elevation (Figure 7b), the percentage of deforestation pix- els was 67% between 17 and 145 m elevation, and it reduced in the high-altitude classes. The FR value was maximum (1.13) for the 79–145 m elevation class. A similar pattern could be observed in TPI (Figure 7d). The highest deforested samples were observed on flat land (53%). Most of the forest reductions were connected with distance to forest edge. In the first 62 m buffer ring, above 46% of the overall deforested samples were concentrated and within 0.5 km, which was 92% of the GEOMATICS, NATURAL HAZARDS AND RISK 49 Table 2. IGR and Relief-F values used for selecting and analysing the deforestation determin- ing factors. DDFs IGR Relief-F Proximity to settlement 0.3100 0.0922 Population growth 0.2811 0.0901 Proximity to road 0.2443 0.0788 Forest density 0.1909 0.0544 Distance from forest edge 0.1588 0.0523 Population density 0.1561 0.0794 Distance from agricultural land 0.1464 0.0455 TPI 0.1422 0.0466 Agricultural land density 0.0897 0.0333 Settlement density 0.0793 0.0282 Elevation 0.0529 0.0410 Proximity to river 0.0511 0.0211 Slope 0.0345 0.0191 Aspect 0.0023 0.0052 samples (Figure 7j). The FR value was also maximum (1.49) for the first buffer ring (0–62 m). A remarkable relationship was found between deforestation occurrence and proximity to the river. The maximum FR value (1.29) was achieved in the 0–156 m buffer ring. The incidence of forest loss decreased with increasing distance from settlement and roads (Figure 7f and k). For proximity to settlement and road, 91% and 87% of the total deforested sample pixels were concentrated within 0.5 km. The FR value of the 0.10–0.50 km road buffer ring was the maximum at 2.12, and the 71–142 m settlement buffer ring had the maximum FR value of 1.11 (Table 3). Deforestation occurrence was negatively associated with forest density (Figure 7g). The percentage shear of deforestation samples and FR value were highest for the low- forest density class. A negative association was also found in case of distance to agricultural land (Figure 7m). A high rate of deforestation occurrence (73%) was determined at less than 200 m from agricultural land, and FR value was maximum for the 0–58 m buffer ring. The concentration of deforestation samples and FR values were high in the areas with high settlement (Figure 7i) and agricultural land density (Figure 7l). Figure 7e and n reveal that heavy deforestation occurred in areas marked by high population density and fast population growth. 4.3. Analysing the deforestation probability The DP indices of all pixels were calculated of the total area, and each pixel was allo- cated with a specific probability index. Probability indices for deforestation were reclassified using a statistical approach. For this analysis, the geometrical interval was used as a statistical tool to reclassify DP indices. The approach of geometric interval is ideal for classifying continuous data as DP indices whilst minimising variance (Frye 2007). The DP indices were classified into five probability classes on the basis of this method, namely, very low, low, moderate, high and very high (Figure 8). The outcome of the MLP model indicated that 25.16%, 22.19%, 21.02%, 14.81% and 16.82% of the overall forest area of the basin fell under very low, low, moderate, high and very high DP classes, respectively (Table 4). The outcomes of the MLP-RTF 50 S. SAHA ET AL. Figure 7. Distribution of deforestation sample in different subclasses of DDFs for analysing the importance of the selected parameters. GEOMATICS, NATURAL HAZARDS AND RISK 51 Table 3. FR values of the DDFs indicate the contribution of sub-classes in making the area poten- tial to deforestation. Factor Subclass FR Factor Subclass FR Proximity to road 0–0.10 1.24 Distance from forest edge 0–62 1.49 0.10–0.50 2.12 62–151 1.21 0.50–1.17 0.72 151–272 1.14 1.17–2.72 1.02 272–518 0.71 2.72–5.34 0.91 518–994 0.89 TPI Valley 0.25 Proximity to settlement 0–71 1.03 Lower slope 0.59 71–142 1.11 Middle slope 0.81 142–229 0.90 Flat 1.24 229–348 0.65 Upper slope 1.07 348–647 0.76 Ridge 0.67 Elevation 17–79 1.03 Aspect flat 1.25 79–145 1.13 N 0.64 145–227 0.99 NE 1.03 227–324 0.90 E 0.76 324–581 0.47 SE 0.84 Agricultural land density 0–0.15 0.91 S 0.94 0.15–0.34 1.06 SW 0.94 0.34–0.55 1.15 W 1.03 0.55–0.76 0.97 NW 1.27 0.76–1 0.94 Slope 0–3.56 1.08 Settlement density 0–0.02 1.00 3.56–8.13 1.06 0.02–0.08 0.98 8.13–14.27 0.70 0.08–0.23 0.94 14.27–21.81 0.74 0.23–0.46 1.73 21.81–50.57 0.91 0.46–1 1.84 Population density 102–343 0.85 Distance from agricultural land Agriculture land 0.91 343–602 0.98 0–58 1.37 602–865 1.33 58–189 1.11 865–1146 1.33 189–538 1.00 1146–1534 1.28 538–3709 0.85 Forest density 0–0.07 1.09 Population growth 0- 0.58 0.90 0.07–0.22 1.21 0.58–1.54 0.92 0.22–0.39 0.76 1.54–2.18 1.05 0.39–0.62 0.65 2.18–3.15 0.88 0.62–1 0.45 3.15–7.17 1.57 Proximity to river 0–156 1.29 156–395 0.98 395–800 1.20 800–1445 0.96 1445–2653 0.59 model showed that 34.98%, 15.67%, 18.98%, 16.87% and 13.50% of the basin’s total forest area fell under very low, low, moderate, high and very high DP classes, respect- ively. In the MLP-dagging model, very low, low, moderate, high and very high DP classes covered 37.44%, 22.52%, 16.17%, 11.23% and 12.64% of the basin’s total forest area, respectively. The land occupied by very low, low, moderate, high and very high PD classes were 33.48%, 19.15%, 17.88%, 16.00% and 13.49%, respectively, in accord- ance with the MLP-bagging method. 4.4. Validation and comparison of DP models The robustness of the DP models was judged using three threshold-dependent meth- ods (AUC of ROC, precision and accuracy), two threshold-independent methods (MAE and RMSE) and two statistical tests (Friedman and Wilcoxon signed-rank 52 S. SAHA ET AL. Figure 8. DP maps classified into very high, high, moderate, low and very low classes produced by (a) MLPnns, (b) MLP-bagging, (c) MLP-dagging, (d) MLP-RTF. tests). The AUCs showed that the precision of the DP maps reached more than 86% (0.86) for the test and validation data sets (Table 5). The MLP-bagging method for train- ing and testing achieved the highest accuracy, followed by MLP-dagging, MLP-RTF and MLPnn. The AUC value of success rate curve (training data) and prediction rate curve (test data)was the highest for the MLP-bagging (0.902 and 0.943) and the lowest for the MLPnns (0.869 and 0.885), respectively (Figure 9). The highest values of precision and accuracy, were obtained by the MLP-bagging and the lowest by the MLPnn, respectively (Table 5). The values of statistical measures, i.e. MAE and RMSE, were calculated in con- sideration of the training and validation data sets. The lowest values (0.24 and 0.38) were obtained for the MLP-bagging ensemble model. On the other hand, the highest values (0.29 and 0.43) were obtained by the MLPnn model. Therefore, from the validation results, it was found that the accuracy of the MLP model was improved after combining with the selected three meta classifiers. On an average AUC of prediction and success rate curves was increased by 3%. The highest increase of AUC values of both curves were found in the MLP-Bagging ensemble modes i.e. 5.4% (in success rate curve) and 5.8% (In prediction rate curve) respect- ively. However, as per the results of ROC, precision, accuracy, MAE and RMSE, the GEOMATICS, NATURAL HAZARDS AND RISK 53 Table 4. Aerial distribution of different deforestation probability classes produced by the selected four machine learning method. MLPnn MLP-RTF MLP-dagging MLP-bagging 2 2 2 2 DP zones Area (%) Area (km ) Area (%) Area (km ) Area (%) Area (km ) Area (%) Area (km ) Very low 25.16 45.94 34.98 63.88 37.44 68.37 33.48 61.14 Low 22.19 40.52 15.67 28.62 22.52 41.12 19.15 34.97 Moderate 21.02 38.38 18.98 34.66 16.17 29.53 17.88 32.65 High 14.81 27.04 16.87 30.81 11.23 20.51 16.00 29.22 Very high 16.82 30.72 13.50 24.65 12.64 23.08 13.49 24.63 robustness level of the MLP-bagging model was higher than those of the other MLPnn and ensemble models. Friedman and Wilcoxon signed-rank tests were used to ascertain the DP models. The results of the Friedman test are presented in Table 6. The mean ranking values for the MLPnn, MLP-bagging, MLP-dagging and RFB-RTF models were 2.77, 2.22, 2.42 and 2.48, respectively. The signed-rank test of Wilcoxon was applied to determine the gaps in pairs amongst the ML models at a relevance level of 5% (Table 7). When p (value) < 5% (0.05) and z (value) > z(1.96 and þ1.96), the capabilities of the models in the Wilcoxon rank test varied substantially . Analysis suggested (Table 7) a substan- tial disparity amongst all DP models. 5. Discussion The changes in the forest cover of the Gumani River Basin are well recognised, with numerous factors primarily focused on institutional, financial and economic aspects (Vanonckelen and van Rompaey 2015), the low performance of protected areas (Balteanu et al. 2016) and environmental disruptions (Savulescu and Mihai 2011). The estimated evaluations for DP are limited, with only a few works assessing the relative impacts of biophysical, socio-demographic and land use approaches on the changes in the forest cover at temporal scales (Munteanu et al. 2015; Vanonckelen and van Rompaey 2015). Thus, we measured the future possibility of deforestation across the Gumani River Basin in this study by using hybrid ensemble frameworks, MLP-bagging, MLP-dagging and MLP-RTF. In this analysis for preparing the DP models first, hybrid ensemble methods were used to optimize the input data using training dataset. Thereafter, optimized input data were used to categorize classes for spatial DP considering the MLPnn base classifier (Roy et al. 2020). Ultimately, frame- works of the machine learning ensemble were developed for the DP models. The results of training sets of DP were used for the creation of DP maps. Ensemble approaches are classification methods for data processing, whilst MLPnns are regarded as ANNs with excellent results in the spatial modelling of deforested areas. The findings of this study indicated that all probability models of deforestation uti- lising hybrid ML ensemble increased the efficiency of the MLPnn (AUC ¼ 0.869) base classifier. This result is reasonable because DP models using hybrid ML ensem- ble systems are well recognised to be very successful in enhancing the efficiency of base classifiers. The DP models in this analysis produced a satisfactory result and allowed basic performance indicators (such as accuracy, precision, AUC, RMSE, MAE 54 S. SAHA ET AL. Table 5. Values of different accuracy assessment measures for four used models considering the training and validation data sets. ROC Model Data sets Sensitivity Specificity Precision Accuracy MAE RMSE AUC Std.Error Asymptotic Sig. MLP- Training 0.98 0.30 0.86 0.85 0.15 0.29 .921 .016 .000 Bagging Validation 0.97 0.23 0.87 0.80 0.19 0.34 .943 .013 .000 MLP- Training 0.94 0.24 0.82 0.80 0.22 0.29 .902 .022 .000 Dagging Validation 0.99 0.20 0.84 0.76 0.24 0.38 .928 .025 .000 MLP- Training 0.96 0.23 0.77 0.77 0.20 0.32 .887 .031 .000 RTF Validation 0.99 0.21 0.79 0.74 0.27 0.37 .902 .035 .000 MLPnn Training 0.95 0.21 0.78 0.77 0.22 0.37 .869 .033 .000 Validation 0.97 0.17 0.71 0.71 0.29 0.43 .885 .037 .000 Figure 9. Validation and comparison of DP models by ROC curves: (a) success rate curve, (training data set) and (b) prediction rate curve (validation data set). Table 6. Results of the Friedman test used for measuring the difference in outputs of different DP models. Models Mean Rank Chi-Square Asymp. Sig MLP-bagging 2.22 42.137 0.000 MLP-dagging 2.42 MLPnn 2.77 RFB-RTF 2.48 Table 7. Results of Wilcoxon signed-rank test used for measuring the difference in outputs of dif- ferent DP models. Pairs Z value P value Significance MLP-bagging vs. MLP-RTF 4.977 .000 Yes MLP-bagging vs. MLP-dagging 3.644 .003 Yes MLPnn vs. MLP-RTF 2.764 .014 Yes MLP-RTF vs. MLP-dagging 3.792 .001 Yes MLPnn vs. MLP-bagging 5.991 .000 Yes MLPnn vs. MLP-dagging 4.480 .000 Yes and Friedman and Wilcoxon signed-rank tests) to be used to evaluate the models. The outcomes produced through the ensemble modes showed a better accuracy than the previously used individual model for the mapping the probability of deforestation (Kumar et al. 2014; Bavaghar 2015;Kruger € and Lakes 2015; Dlamini 2016; Mayfield et al. 2017; Sahana et al. 2018; Kucsicsa and Dumitrica 2019; Saha et al. 2020). Due to the less error and very low overfitting problem, the ensemble methods provided GEOMATICS, NATURAL HAZARDS AND RISK 55 better results than previous works done by the different scholars (Roy et al. 2020). The quantity or overall area of deforestation is helpful for planning or zoning, but the models could not be used for measurement. Another drawback of the used mod- els is that the assumed predictors of deforestation do not alter with time. This draw- back is common amongst many ML models, but it is especially applicable to our models because deforestation predictors were chosen on the basis of predisposing risk factors for deforestation (Geist and Lambin 2001; Mas et al. 2004). Despite these drawbacks, the findings showed that data sets that are publicly accessible could be considered to estimate the DP within the research area. DP models utilising ensemble frameworks were compared. The results of the evalu- ation of the DP maps were obtained using ROC, efficiency, accuracy, MAE, RMSE and two statistical tests, i.e. Friedman and Wilcoxon signed-rank tests (Tables 5–7). The results showed that MLP-bagging considerably outperformed the other models. MLP- bagging (AUC ¼ 0.943) had the strongest predictive capacity, followed by MLP- dagging (AUC ¼ 0.928), MLP-RTF (AUC ¼ 0.884) and MLP models (AUC ¼ 0.902). MLP-bagging is more efficient in mitigating volatility and discrimination compared with other ensemble approaches (Pham et al. 2017; Sedano et al. 2013). Feature selection approach is widely used to test the predictive capacity of variables to improve model per- formance by eliminating unwanted or unimportant factors in advance (Pham, Pradhan et al. 2016). The Relief-F and IGR methods were utilised in this analysis for selecting and judging the predictive potentiality of different DDFs for DP models. On the basis of these methods, the distance to settlement and the distance to road and population growth showed the strongest influences on DP models because most of deforested loca- tions were identified on or along road and settlement. The remaining factors, such as forest density, distance to forest edge, proximity to river, population density, agricultural land density, distance to agricultural land, density of settlement, altitude, slope and aspect, also indicated good contributions to DP models, as confirmed in other similar studies (Sahana et al. 2018). A relative difference of nearly 3% was determined from the comparison results of the DP models on the basis of the ROC curve, but it was substan- tial for the DP maps (Table 5). Therefore, even minor changes in the efficiency of DP models would contribute to increased change in the reliability of DP maps. Furthermore, the efficiency of such probability models for deforestation depends greatly on optimising the predictive parameters. The output of this research might help researchers to analyse deforestation in other areas. Hybrid ensemble approaches could also be used to assess data and serve as reli- able alternatives to conventional computational strategies for modelling DP. The use of soft computing approaches would inspire the scientific communities to use sophis- ticated techniques for precisely modelling probable deforestation areas. In populated countries, such as India, this work would assist the policymakers in making strategic plans for managing the existing forest cover. 6. Conclusions In this research, hybrid ensemble frameworks, MLP-bagging, MLP-dagging and MLP- RTF, were effectively implemented for the analysis of DP of the Gumani River Basin. 56 S. SAHA ET AL. ROC, accuracy, precision, MAE, RMSE and Friedman and Wilcoxon signed-rank tests were used to validate and compare four DP models. The findings indicated that DP models utilising ML ensemble systems worked well in this study, and substantial differences existed amongst the models. Among the MLPnn, MLP-Bagging and MLP-Dagging model, the MLP-Bagging model produced the best performance in terms of accurateness (efficiency, accur- acy and AUC) and reliability (RMSE and MAE). It may be concluded that to pre- pare an accurate deforestation probability map, MLP-Bagging model can be very effective. After ensemble of meta-classifiers with the base classifier, the accuracy of the MLPnn model was increased significantly. Delineating deforestation probability areas by means of field based methods are expensive and time-consuming, especially for the large watersheds. Therefore, as an alternative, application of ensemble machine learning models along with RS- GIS based data and interfaces could be very effective in creating deforestation probability map. Finally, the produced deforestation probability maps for the Gumani River basin displayed the areas having high and very-high probability of deforestation, which could be an effective device for policymakers and environmental planners. This research indicated that the ML models are powerful techniques that can be used for the DP evaluation of an area. The adequate precision acquired by the ensem- ble models and validation methods confirmed that the models have acceptable preci- sion. The results would also provide spatial evidence to execute appropriate policies and strategies for forest managers and environmental planners. In fact, the deforest- ation process is closely correlated with certain natural and anthropogenic factors. The findings might be valuable for deforestation predictions in other regions having simi- lar geo-environmental conditions. Furthermore, the findings would provide a founda- tion for future research. Existing DDFs might be combined with other DDFs, modified as per changes in the physical or socio-economic context of the Gumani River Basin, to enable for an improved and realistic simulation of DP. Disclosure statement The authors declare that there is no conflict of interest. Funding This research was supported by the Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney. This APC was funded by Universiti Kebangsan Malaysia, DANA IMPAK PERDANA with grant no: DIP-2018-030. It was also supported by Researchers Supporting Project (number RSP-2020/14), King Saud University, Riyadh, Saudi Arabia. GEOMATICS, NATURAL HAZARDS AND RISK 57 Author contributions S.S. contributed in the methodology development, formal analysis, investigation, original draft preparation and manuscript review and editing; S.S. and G.P. performed the experiments, wrote the manuscript and collected the field data; S.S. wrote the manuscript and analysed the data; B.P. edited, restructured and professionally optimised the manuscript; B.P. and A.A. arranged the funding acquisition. All authors have read and agreed to the published version of the manuscript. ORCID Biswajeet Pradhan http://orcid.org/0000-0001-9863-2054 References Altun H, Polat E, Polat G, Gune € u T. 2007. Identifying and combining multi-modal biometric features from voice and facial image signs to improve human computer interaction. Tubitak € Res Project Rep. 104E179:42–50. Arekhi M. 2011. Modeling spatial pattern of deforestation using GIS and logistic regression: a case study of northern Ilam forests, Ilam province, Iran. Afr J Biotechnol. 10(72): 16236–16249. Balteanu D, Nastase M, Dumitras¸cu M, Grigorescu I. 2016. Environmental changes in the Maramures¸ Mountains Natural Park. In: Sustainable development in mountain regions. Cham (Switzerland): Springer; p. 335–348. Bavaghar MP. 2015. Deforestation modelling using logistic regression and GIS. J For Sci. 61(5):193–199. Bax V, Francesconi W, Quintero M. 2016. Spatial modeling of deforestation processes in the Central Peruvian Amazon. J Nat Conserv. 29:79–88.. Bayat MF. 2000. Surveying of the relationship between vegetation cover and some environ- mental variables (altitude, aspect and slope). Pajouhesh-va-Sazandegi. 4(45):24–27. Benediktsson J, Swain PH, Ersoy OK, 1990. Neural network approaches versus statistical meth- ods in classification of multisource remote sensing data. IEEE Trans Geosci Remote Sens. 282. (4):540–552. Boudreau S, Lawes MJ, Piper SE, Phadima LJ. 2005. Subsistence harvesting of pole-size under- storey species from Ongoye Forest Reserve, South Africa: species preference, harvest inten- sity, and social correlates. For Ecol Manage. 216(1-3):149–165. Bouldin J. 2008. Some problems and solutions in density estimation from bearing tree data: a review and synthesis. J Biogeogr. 35(11):2000–2011. Breiman L. 1996. Bagging predictors. Mach Learn. 24(2):123–140. Buchanan GM, Butchart SH, Dutson G, Pilgrim JD, Steininger MK, Bishop KD, Mayaux P. 2008. Using remote sensing to inform conservation status assessment: estimates of recent deforestation rates on New Britain and the impacts upon endemic birds. Biol Conserv. 141(1):56–66. Cai H, Ng M. 2012. Feature weighting by RELIEF based on local hyperplane approximation. In Pacific-Asia conference on knowledge discovery and data mining. Berlin (Germany): Springer; p. 335–346. Can T, Nefeslioglu HA, Gokceoglu C, Sonmez H, Duman TY. 2005. Susceptibility assessment of shallow earthflows triggered by heavy rainfall at three sub catchments by logistic regres- sion analyses. Geomorphology. 72(1-4):250–271. Census of India. 1991. Office of the Registrar General & Census Commissioner, India, Ministry of Home Affairs, Government of India. Census of India. 2001. Office of the Registrar General & Census Commissioner, India, Ministry of Home Affairs, Government of India. 58 S. SAHA ET AL. Census of India. 2011. Office of the Registrar General & Census Commissioner, India, Ministry of Home Affairs, Government of India. Chandniha SK, Meshram SG, Adamowski JF, Meshram C. 2017. Trend analysis of precipita- tion in Jharkhand State. Theor Appl Climatol. 130(1-2):261–274. Chomitz K, Gray DA. 1999. Roads, lands, markets, and deforestation: a spatial model of land use in Belize. The Wor Bank Eco Rev. 10(3) : 487–512. Congalton RG. 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens Environ. 37(1):35–46. Cropper M, Griffiths C. 1994. The interaction of population growth and environmental qual- ity. Am Econ Rev. 84(2):250–254. Deng JS, Wang K, Deng YH, Qi GJ. 2008. PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data. Int J Remote Sens. 29(16):4823–4838. Dlamini WM. 2016. Analysis of deforestation patterns and drivers in Swaziland using efficient Bayesian multivariate classifiers. Model Earth Syst Environ. 2(4):1–14. Ercanoglu M, Gokceoglu C. 2002. Assessment of landslide susceptibility for a landslide-prone area (north of Yenice, NW Turkey) by fuzzy approach. Environ Geol. 41(6):720–730. Fang Z, Wang Y, Peng L, Hong H. 2020. A comparative study of heterogeneous ensemble- learning techniques for landslide susceptibility mapping. Int J Geogr Inf Sci. 16:1–27. Friedman M. 1937. The use of ranks to avoid the assumption of normality implicit in the ana- lysis of variance. J Am Stat Assoc. 32(200):675–701. Frye C. 2007. About the geometrical interval classification method. Environmental Systems Research Institute, Inc. Online verfugbar € unter. https://blogs.esri.com/esri/arcgis/2007/10/18/ about-thegeometrical-interval-classification-method Gardner MW, Dorling SR. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ. 32(14-15):2627–2636. Gaveau DL, Epting J, Lyne O, Linkie M, Kumara I, Kanninen M, Leader-Williams N. 2009. Evaluating whether protected areas reduce tropical deforestation in Sumatra. J Biogeogr. 36(11):2165–2175. Gayen A, Saha S. 2018. Deforestation probable area predicted by logistic regression in Pathro river basin: a tributary of Ajay River. Spat Inf Res. 26(1):1–9. Geist HJ, Lambin EF. 2001. What drives tropical deforestation. LUCC Report Series. 4:116. Geist HJ, Lambin EF. 2002. Proximate causes and underlying driving forces of tropical defor- estation. BioScience. 52:142–150. Glade T. 2003. Landslide occurrence as a response to land use change: a review of evidence from New Zealand. Catena. 51(3-4):297–314. Gong P. 2009. Integrated analysis of spatial data for multiple sources: using evidential reason- ing and artificial neural network techniques for geological mapping. ISPRS J Photogramm Remote Sens. 62:513–523. Hamzah ML, Amir AA, Abdul Maulud KN, Sharma S, Mohd FA, Selamat SN, Karim OA, Ariffin EH, Ara Begum R. 2020. Assessment of the Mangrove forest changes along the Pahang coast using remote sensing and GIS technology. J Sustain Sci Manage. 15 (5):43–58. Haykin SS. 2009. Neural networks and learning machines/Simon Haykin. New York: Prentice Hall. Hosonuma N, Herold M, De Sy V, De Fries RS, Brockhaus M, Verchot L, Angelsen A, Romijn E. 2012. An assessment of deforestation and forest degradation drivers in developing coun- tries. Environ Res Lett. 7(4):044009. Houet T, Hubert-Moy L. 2006. Modeling and projecting land-use and land-cover changes with cellular automaton in considering landscape trajectories : an improvement for simulation of plausible future states. EARSel eProceedings. 5(1): 63–76. Jennes J. 2006. Topographic Position Index. tpi jen.avx, extension for ArcView 3.x; v.1.3a. Jenness Enterprises. http://www.jennessent.com/arcview/tpi.htm Kaim D, Radeloff VC, Szwagrzyk M, Dobosz M, Ostafin K. 2018. Long-term changes of the wildland–urban interface in the Polish Carpathians. IJGI. 7(4):137. GEOMATICS, NATURAL HAZARDS AND RISK 59 Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Prakash I, Bui DT. 2018. A comparative assessment of decision trees algorithms for flash flood susceptibility model- ing at Haraz watershed, northern Iran. Sci Total Environ. 627:744–755. Kira K, Rendell LA. 1992. A practical approach to feature selection. In Sleeman D, Edwards P. (Eds), Machine learning proceedings. Morgan Kaufmann; p. 249–256. Kotsianti SB, Kanellopoulos D. 2007. Combining bagging, boosting and dagging for classifica- tion problems. Apolloni B, Howlett RJ, Jain L. (Eds), Knowledge-based intelligent informa- tion and engineering systems. KES 2007. Lecture Notes in Computer Science, vol 4693. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74827-4_62 Kruger € C, Lakes T. 2015. Bayesian belief networks as a versatile method for assessing uncer- tainty in 621 land-change modeling. Int J Geogr Inf Sci. 29(1):111–131. Kucsicsa G, Dumitrica C. 2019. Spatial modelling of deforestation in Romanian Carpathian Mountains using GIS and Logistic Regression. J Mt Sci. 16(5):1005–1022. Kumar R, Indrayan A. 2011. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr. 48 (4):277–287. Kumar R, Nandy S, Agarwal R, Kushwaha SPS. 2014. Forest cover dynamics analysis and pre- diction modeling using logistic regression model. Ecol Indic. 45:444–455. Kuncheva LI, Rodrıguez JJ. 2007. An experimental study on rotation forest ensembles. In: Haindl M, Kittler J, Roli F. (Eds) Multiple classifier systems. MCS 2007. Lecture Notes in Computer Science, vol 4472. Springer, Berlin, Heidelberg. Lambin EF, Turner BL, Geist HJ, Agbola SB, Angelsen A, Bruce JW, Coomes OT, Dirzo R, Fischer G, Folke C, et al. 2001. The causes of land-use and land-cover change: moving beyond the myths. Global Environ Change. 11(4):261–269. Lee S, Pradhan B. 2007. Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides. 4(1):33–41. Liu H, Motoda H. 2008. Computational methods of feature selection. Chapman and Hall/CRC Press. Taylor & Francis Group, Boca Raton, London, New York. Ludeke A, Maggio RC, Reid L. 1990. An analysis of anthropogenic deforestation using logistic regression and GIS. J Environ Manage. 31(3):247–259. Mart ınez-Alvarez F, Reyes J, Morales-Esteban A, Rubio-Escudero C. 2013. Determining the best set of seismicity indicators to predict earthquakes. Two case studies: Chile and the Iberian Peninsula. Knowledge-Based Syst. 50:198–210. Mas JF, Puig H, Palacio JL, Sosa-Lopez A. 2004. Modelling deforestation using GIS and artifi- cial neural networks. Environ Modell Softw. 19(5):461–471. Matlack GR. 1994. Vegetation dynamics of the forest edge–trends in space and successional time. J Ecol. 82(1):113–123. Mayfield H, Smith C, Gallagher M, Hockings M. 2017. Use of freely available datasets and machine 649 learning methods in predicting deforestation. Environ Modell Softw. 87:17–28. Michalski F, Peres CA, Lake IR. 2008. Deforestation dynamics in a fragmented region of southern Amazonia: evaluation and future scenarios. Envir Conserv. 35(2):93–103. Millennium Ecosystem Assessment ME. Ecosystems and human well-being. Synthesis. 2005. Island Press, Washington, DC, 137 pp. Minetos D, Polyzos S. 2010. Deforestation processes in Greece: a spatial analysis by using an ordinal regression model. For Policy Econ. 12(6):457–472. Munteanu C, Kuemmerle T, Boltiziar M, Butsic V, Gimmi U, Halada L, Kaim D, Kiraly G, Konkoly-Gyuro E, Kozak J, et al. 2014. Forest and agricultural land change in the Carpathian region—a meta-analysis of long-term patterns and drivers of change. Land Use Policy. 38:685–697. Munteanu C, Kuemmerle T, Keuler NS, Muller € D, Balazs P, Dobosz M, Griffiths P, Halada L, Kaim D, Kiraly G, et al. 2015. Legacies of 19th century land use shape contemporary forest cover. Global Environ Change. 34:83–94. Nackaerts K, Vaesen K, Muys B, Coppin P. 2005. Comparative performance of a modified change vector analysis in forest change detection. Int J Remote Sens. 26(5):839–852. 60 S. SAHA ET AL. Nandy S, Kushwaha SPS, Mukhopadhyay S. 2007. Monitoring the Chilla–Motichur wildlife corridor using geospatial tools. J Nat Conserv. 15(4):237–244. Newman ME, McLaren KP, Wilson BS. 2014. Assessing deforestation and fragmentation in a tropical moist forest over 68 years; the impact of roads and legal protection in the Cockpit Country, Jamaica. For Ecol Manage. 315:138–152. Onan A. 2016. Classifier and feature set ensembles for web page classification. J Inf Sci. 42(2): 150–165. Ortega Adarme M, Queiroz Feitosa R, Nigri Happ P, Aparecido De Almeida C, Rodrigues Gomes A. 2020. Evaluation of deep learning techniques for deforestation detection in the Brazilian Amazon and Cerrado Biomes from remote sensing imagery. Remote Sens. 12(6): Pepe MS. 2000. Receiver operating characteristic methodology. J Am Stat Assoc. 95(449): 308–311. Pham BT, Bui DT, Prakash I, Dholakia MB. 2016b. Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using GIS. Nat Hazards. 83(1):97–127. Pham BT, Pradhan BT, Bui D, Prakash I, Dholakia MB. 2016a. A comparative study of differ- ent machine learning methods for landslide susceptibility assessment: a case study of Uttarakhand area (India). Environ Modell Softw. 84:240–250. Pham BT, Bui DT, Prakash I, Dholakia MB. 2017. Hybrid integration of multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena. 149: 52–63. Pontius RG, Jr, Schneider LC. 2001. Land-cover change model validation by an ROC method for the Ipswich watershed, Massachusetts, USA. Agric Ecosyst Environ. 85(1-3):239–248. Quinlan JR. 1993. C4.5: programs for machine learning. San Mateo (CA): Morgan Kaufmann. Rahmati O, Naghibi SA, Shahabi H, Bui DT, Pradhan B, Azareh A, Rafiei-Sardooi E, Samani AN, Melesse AM. 2018. Groundwater spring potential modelling: comprising the capability and robustness of three different modeling approaches. J Hydrol. 565:248–261.. Robinson BE, Holland MB, Naughton-Treves L. 2014. Does secure land tenure save forests? A meta-analysis of the relationship between land tenure and tropical deforestation. Global Environ Change. 29:281–293. Rodriguez JJ, Kuncheva LI, Alonso CJ. 2006. Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell. 28(10):1619–1630. Roy J, Saha S. 2020. Integration of artificial intelligence with meta classifiers for the gully ero- sion susceptibility assessment in Hinglo river basin, Eastern India. Adv Space Res. http:// www.sciencedirect.com/science/article/pii/S0273117720307249. Roy J, Saha S, Arabameri A, Blaschke T, Bui DT. 2019. A novel ensemble approach for land- slide susceptibility mapping (LSM) in Darjeeling and Kalimpong Districts, West Bengal, India. Remote Sensing. 11(23):2866. Roy P, Chakrabortty R, Chowdhuri I, Malik S, Das B, Pal SC. 2020. Development of Different Machine Learning Ensemble Classifier for Gully Erosion Susceptibility in Gandheswari Watershed of West Bengal, India. In Machine Learning for Intelligent Decision Science, ( p. 1–26. Springer, Singapore. Saad SNM, Maulud KNA, Jaafar WSWM, Kamarulzaman AMM, Omar H. 2020. Tree stump height estimation using canopy height model at tropical forest in Ulu Jelai Forest Reserve, Pahang, Malaysia. IOP Conf Ser Earth Environ Sci. 540 (2020):012015. Saha AK, Gupta RP, Arora MK. 2002. GIS-based landslide hazard zonation in the Bhagirathi (Ganga) valley, Himalayas. Int J Remote Sens. 23(2):357–369. Saha S, Saha A, Hembram TK, Pradhan B, Alamri AM. 2020. Evaluating the performance of indi- vidual and novel ensemble of machine learning and statistical models for landslide susceptibil- ity assessment at Rudraprayag district of Garhwal Himalaya. Applied Sciences. 10(11):3772. Saha S, Saha M, Mukherjee K, Arabameri A, Ngo PTT, Paul GC. 2020. Predicting the deforest- ation probability using the binary logistic regression, random forest, ensemble rotational forest and REPTree: a case study at the Gumani River Basin, India. Sci Total Environ. 730:139197. GEOMATICS, NATURAL HAZARDS AND RISK 61 Sahana M, Hong H, Sajjad H, Liu J, Zhu AX. 2018. Assessing deforestation susceptibility to forest ecosystem in Rudraprayag district, India using fragmentation approach and frequency ratio model. Sci Total Environ. 627:1264–1275. Savulescu I, Mihai B. 2011. Geographic information system (GIS) application for windthrow map- ping and management in Iezer Mountains, Southern Carpathians. J For Res. 23(2):175–184. Sedano J, Gonzalez S, Herrero A, Baruque B, Corchado E. 2013. Mutating network scans for the assessment of supervised classifier ensembles. Logic J IGPL. 21(4):630–647. Siles NJS. 2009. Spatial Modelling and prediction of tropical forest conversion in the Isiboro Secure National Park and Indigenous Territory (TIPNIS), Bolivia, Enschede (The Netherlands): International Institute for Geoinformation Science and Earth Observation. (M.Sc. Thesis). Sobala M, Rahmonov O, Myga-Piatek U. 2017. Historical and contemporary forest ecosystem changes in the Beskid Mountains (southern Poland) between 1848 and 2014. iForest. 10(6): 939–947. Subasi A. 2007. EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst Appl. 32(4):1084–1093. Sun J, Southworth J. 2013. Remote sensing-based fractal analysis and scale dependence associated with forest fragmentation in an Amazon tri-national frontier. Remote Sensing. 5(2):454–472. Suzen € ML, Doyuran V. 2004. A comparison of the GIS based landslide susceptibility assess- ment methods: multivariate versus bivariate. Environ Geol. 45(5):665–679. Szymura TH, Murak S, Szymura M, Raduła MW. 2018. Changes in forest cover in Sudety Mountains during the last 250 years: patterns, drivers, and landscape-scale implications for nature conservation. Acta Soc Bot Pol. 87(1). 1-14. Tien Bui D, Pham BT, Nguyen QP, Hoang ND. 2016. Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: a case study in Central Vietnam. Int J Digital Earth. 9(11):1077–1097. Tien Bui D, Shahabi H, Shirzadi A, Chapi K, Pradhan B, Chen W, Khosravi K, Panahi M, Bin Ahmad B, Saro L. 2018. Land subsidence susceptibility mapping in South Korea using machine learning algorithms. Sensors. 18(8):2464. Tien Bui D, Shirzadi A, Chapi K, Shahabi H, Pradhan B, Pham BT, Singh VP, Chen W, Khosravi K, Bin Ahmad B, Lee S. 2019. A hybrid computational intelligence approach to groundwater spring potential mapping. Water. 11(10):2013. Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I. 2016. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides. 13(2):361–378. Ting KM, Witten IH. 1997. Stacking bagged and dagged models (Working paper 97/09). Hamilton, New Zealand: University of Waikato, Department of Computer Science. Turner MG, Gardner RH, O’Neill RV. 2001. Landscape ecology in theory and practice: pattern and process. New York (NY): Springer. Vanonckelen S, van Rompaey A. 2015. Spatiotemporal analysis of the controlling factors of forest cover change in the Romanian Carpathian Mountains. Mt Res Dev. 35(4):338–350.. Wahab NA, Kamarudin MK, Toriman ME, Juahir H, Saad M, Ata FM, Ghazali A, Hassan AR, Abdullah H, Maulud KN, et al. 2019. Sedimentation and water quality deterioration prob- lems at Terengganu river basin, Terengganu, Malaysia. DWT. 149 (2019):228–241. Wan Mohd Jaafar WS, Maulud KN, Kamarulzaman AM, Raihan AM, Sah S, Ahmad A, Saad SNM, Mohd Azmi AT, Jusoh Syukri NKA, Khan WR. 2020. The influence of deforestation on land surface temperature – a case study of Perak and Kedah, Malaysia. Forests. 11(6):670. Wan Mohd Jaafar WS, Woodhouse IH, Silva CA, Omar H, Abdul Maulud KN, Hudak AT, Klauberg C, Cardil A, Mohan M. 2018. Improving individual tree crown delineation and attributes estimation of tropical forests using airborne LiDAR data. Forests. 9(12):759. 62 S. SAHA ET AL. Wang G, Oyana T, Zhang M, Adu-Prah S, Zeng S, Lin H, Se J. 2009. Mapping and spatial uncertainty analysis of forest vegetation carbon by combining national forest inventory data and satellite images. For Ecol Manage. 258(7):1275–1283. Weier J, Herring D. 2000. Measuring Vegetation (NDVI & EVI) Earth Observatory. National Aeronautics and Space Administration. http://earthobservatory.nasa.gov/Features/ MeasuringVegetation/. Weiss A. 2001. Topographic position and landforms analysis. Poster presentation. ESRI User Conference, San Diego, CA. Wilson K, Newton A, Echeverr ıa C, Weston C, Burgman M. 2005. A vulnerability analysis of the temperate forests of south central Chile. Biol Conserv. 122(1):9–21. Wilcoxon, F. 1945. Individual comparisons by ranking methods. Biom Bull. 1 (6): 80–83. Witten DM, Tibshirani R. 2011. Penalized classification using Fisher’s linear discriminant. J R Stat Soc Ser B (Stat Methodol). 73(5):753–772. Wu Y, Ke Y, Chen Z, Liang S, Zhao H, Hong H. 2020. Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. Catena. 187:104396. Xia J, Du P, He X, Chanussot J. 2014. Hyper spectral remote sensing image classification based on rotation forest. IEEE Geosci Remote Sensing Lett. 11(1):239–243. Yalcin A. 2008. GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. Catena. 72(1):1–12.
"Geomatics, Natural Hazards and Risk" – Taylor & Francis
Published: Jan 1, 2021
Keywords: Deforestation probability; hybrid ensemble techniques; machine learning; GIS; remote sensing; India
Access the full text.
Sign up today, get DeepDyve free for 14 days.