Comparative analysis of artificial intelligence techniques for the prediction of infiltration process
Comparative analysis of artificial intelligence techniques for the prediction of infiltration...
Singh, Balraj; Sihag, Parveen; Parsaie, Abbas; Angelaki, Anastasia
2021-04-03 00:00:00
GEOLOGY, ECOLOGY, AND LANDSCAPES 2021, VOL. 5, NO. 2, 109–118 INWASCON https://doi.org/10.1080/24749508.2020.1833641 RESEARCH ARTICLE Comparative analysis of artificial intelligence techniques for the prediction of infiltration process a b c d Balraj Singh , Parveen Sihag , Abbas Parsaie and Anastasia Angelaki a b Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, India; Civil Engineering Department, Shoolini c d University, Solan, India; Hydro-Structure Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran; Department of Agriculture Crop Production and Rural Environment, University of Thessaly, Volos, Greece ABSTRACT ARTICLE HISTORY Received 21 January 2020 Knowledge of the infiltration process is beneficial in designing and planning of irrigation Accepted 4 October 2020 networks, soil erosion, hydrologic design, and watershed management. In this study, the infiltration process was analyzed using predictive models of artificial neural network (ANN), KEYWORDS multi-linear regression (MLR), Random Forest regression (RF), M5P tree, and their performances Infiltration process; artificial were compared with the empirical model: Kostiakov model. Field experimental data was intelligence techniques; implemented for training and testing the above models, and their outcomes were assessed kostiakov model; nash- with the help of suitable performance assessment parameters. These models were assessed sutcliff efficiency; multi- linear regression using a field dataset containing 340 observations, out of which 70% were used for the training purpose and the remaining for the testing. The RF-based models perform better than other models with Nash-Sutcliffe model efficiency (NSE) equal to 0.9963 and 0.9904 for the training and testing stages, correspondingly. ANN, MLR, and M5P model also give a good prediction performance, but the Kostiakov model’s performance is inferior. Sensitivity investigation suggests that the parameters, cumulative time, and moisture content in the soil are the most influential parameters for assessing the cumulative infiltration of soil. 1. Introduction conditions. The experimental measurement of the infiltration process is laborious, tedious, and time- Infiltration is water movement into the subsurface consuming (Vand et al., 2018). Assessment of the from surface sources, for instance, snowfall, irriga- infiltration process is very complex due to spatial and tion, precipitation, etc. The soil-water relationship temporal variation (Pandey & Pandey, 2018). plays a crucial role in modeling towards water man- Numerous studies (Mishra et al., 2003; Singh et al., agement, control of droughts and floods, rainfall- 2018) proposed implementing conventional infiltra - runoff, evaluations of risk, design, scheduling of irri- tion models as a substitute for experimental observa- gation system, development of water resources, and tion. The use of any specific model needs complete drainage design, etc. Various physical properties of knowledge of boundary conditions and assumptions the soil are affecting the infiltration characteristics. of that model. Several soil water scientists introduced Soil texture, soil moisture, and density have consid- several infiltration models such as Kostiakov, Horton, erable influence on the infiltration process (Angelaki Philip, Holton, Green-Ampt, Novel, Modified et al., 2013). The texture of the soil is also one of the Kostiakov, etc. for estimating the infiltration most crucial factors which influence the infiltration (Richards,1931; Philips,1957; Singh & Yu,1990; process. Water accessibility in the soil depends on the Mishra et al.2003; Sihag et al.2017a). Mishra et al. soil’s water-holding ability, which is affected by the (2003) divided these models into three groups, texture and structure of the soil (Al-Azawi, 1985). Physical models, Semi-empirical and empirical mod- The infiltration rate is high in unsaturated soil. It els. Most of these models are based on the basic reduces gradually and finally reaches to the constant assumption of homogeneous water absorption, infiltration rate. Knowledge of infiltration is neces- pounding head, and constant infiltration rate. These sary for any valuable and durable projects of water hypotheses hardly ever found under real field condi- resources management (Sihag et al., 2018a). The irri- tions, which may lead to the inaccurate prediction of gation system’s design and scheduling rely on the the infiltration process. soil’s infiltration because it affects various design Some researchers used an alternative method for considerations of agriculture and canal systems. estimating the infiltration process. They use several Infiltration characteristics vary at the scale due to soft computing based infiltration models based on variation in texture and type of the soil and other soil CONTACT Balraj Singh balrajzinder@gmail.com Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, Haryana, India © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of the International Water, Air & Soil Conservation Society(INWASCON). This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 110 B. SINGH ET AL. Figure 1. The map of the selected study area. soil properties. Several successful applications of soft studies e.g., Sattari et al. (2018), Sattari et al. computing based infiltration prediction reported in (2013), Pal et al. (2012), and Pal and Deswal the literature such as (Tiwari et al., 2017; Singh (2009). et al.,2017; Sihag et al.,2017b)concluded that soil phy- RF, M5P, and ANN-based models extract sical properties and elapsed time are effectively knowledge from data itself. The best performing selected to estimate infiltration process with higher model identifies using appropriate performance precision. assessment parameters such as Nash- Sutcliffe model efficiency, root mean square error, mean In recent years, soft computing approaches such square error, and correlation coefficient. In this as Random Forest, M5P, SVM, GEP, Gaussian study, models are developed to predict the cumu- process, ANFIS, and many more approaches have lative infiltration of soil and compare the perfor- been successfully implemented in water resources mances of soft computing-based models with problems (Azamathulla et al.,2016; Parsaie & empirical models (Kostiakov model and multi- Haghiabi,2015,2014; Parsaie et al.,2017; Sihag, linear regression (MLR)). 2018; Sihag et al.,2017c; Tiwari et al.,2018). This paper uses a model based on RF, as proposed by Breiman (2001). It is a powerful tool for nonlinear 2. Materials and methodology regression and classification. Examples using the RF capability include infiltration process modeling 2.1. Study area (Singh et al.,2017,2018). ANN is working on the Kurukshetra district of Haryana state lies in the principle of nerve cells of the brain. ANN has north-east part of the State, India, and is bounded widely applied in the field of engineering and by North latitudes 29°53ʹ00” and 30°15ʹ02” and East observed that performs better than conventional longitudes 76°26ʹ27” and 77°07ʹ57”. Thanesar Tehsil models e.g., Sihag (2018), Tiwari and Sihag (2018), of Kurukshetra district is selected as a study area. Haghiabi et al. (2017), and Ghorbani et al. (2016). The total area of the Kurukshetra district is M5P tree, initially proposed by Quinlan (1992), is 1530 Km . The site map of the study area is given a decision tree learner for regression problems. in Figure 1. The study area (Thanesar) is a division M5P tree-based model involves linear regression of the Ghaggar basin. A total of 20 different sites functions at the terminal nodes and fits were selected for experimentation in the study area. a multivariate linear regression model to each sub- The coordinates of all sampling sites are scheduled space by classifying or separating the entire data inTable 1. The texture of the soil is scheduled space into multiple subspaces. M5P has been suc- inTable 2. cessfully used in the water resources related GEOLOGY, ECOLOGY, AND LANDSCAPES 111 Table 1. The details of the coordinates of the sampling sites. remaining 30% of the entire data. The features of Site No. Sites Latitude Longitude training and testing data sets are represented in 1 Dayalpur 29.939648 N 76.814545E Table 3. Time, sand, clay, silt, bulk density, and moist- 2 Samshipur 29.925980 N 76.803795E ure content are input parameters, and soil’s cumula- 3 Kirmach (SKS) 29.911368 N 76.794275E 4 Alampur 29.938222 N 76.824080E tive infiltration is the target. 5 Sanheri Khalsa 29.918557 N 76.826591E 6 Mirzapur 29.950163 N 76.781358E 7 Khanpur Roran 29.939504 N 76.757209E 8 Barna 29.924569 N 76.733358E 2.3. Observation of cumulative infiltration 9 Pindarsi 29.919078 N 76.702227E 10 Kamoda 29.936836 N 76.736818E Experiments were performed to measure the cumula- 11 Lohar Majra 29.958742 N 76.727137E tive infiltration of soil in the study area’s locations 12 Jyotisar 29.960166 N 76.760195E 13 Narkatari 29.962200 N 76.797872E using a mini-disk infiltrometer (Decagon Devices 14 Kurukshetra University 29.95.5052 N 76.815767E Inc., Devices, 2014). Two chambers are available in 15 Thim Park 29.967055 N 76.832005E 16 Darra Khera 29.981300 N 76.822550E a mini-disk infiltrometer. One is a water reservoir, and 17 Bhiwani Khera 29.994305 N 76.826474E the other is a bubble. Both are connected via 18 Bahadur Pura 30.008150 N 76.834262E a Mariotte tube. This tube is used to provide a steady 19 Hansala 30.011900 N 76.811639E 20 Durala 30.025939 N 76.809048E water pressure head of 0.05 to 0.7 kPa. The instru- ment’s bottom part contains a porous sintered steel disk having a diameter of 4.5 cm and a thickness of Table 2. The texture of the soil. 3 mm. The water is filled in both chambers and placed Site Sand Clay No. Location Texture (%) (%) Silt (%) on the soil’s flat surface (Figure 2), ensuing in water 1 Dayalpur Loamy Sand 78.73 7.4445 13.8255 moving into the soil. During the measurement, the 2 Samshipur Clay 39.84 55.3472 4.8128 quantity of the water in the reservoir chamber was 3 Kirmach (SKS) Clay 37.14 43.3734 19.4866 4 Alampur Sandy clay 47.5 25.2 27.3 recorded at specific intervals. The flowchart diagram Loam for the current investigation is represented in Figure 3. 5 Sanheri Sandy clay 52.11 24.9028 22.9872 Figure 2 represented the flowchart of the investigation. Khalsa Loam 6 Mirzapur Clay 26.63 41.8209 31.5491 The first step was designing the experiments followed 7 Khanpur Clay loam 32.94 29.5064 37.5536 by the collection & analysis of data, comparison of the Roran 8 Barna Clay loam 31.52 35.7133 32.7667 Artificial Intelligence techniques and empirical mod- 9 Pindarsi Sandy clay 47.6 27.248 25.152 els, the best-fitted model for prediction of the infiltra - Loam 10 Kamoda Loam 42.85 24.003 33.147 tion process, and conclusion. 11 Lohar Majra Clay loam 24.6 39.962 35.438 12 Jyotisar Sandy clay 52.71 34.5217 12.7683 Loam 13 Narkatari Clay loam 22.93 32.3694 44.7006 2.4. Modeling approaches 14 KUK Clay 52.74 19.85 27.41 15 Thim Park clay 36.7 26.586 36.714 2.4.1. Artificial neural networks (ANN) 16 Dara kheda Sandy clay 35.31 59.5148 5.1752 ANN is a data mining approach, generally implemen- Loam 17 bhiwani Sandy clay 59.58 30.7192 9.7008 ted in several engineering fields. The idea of the ANN kheds Loam model generation is inspired by the nerve cell of the 18 bhaderpura Clay 50.78 23.6256 25.5994 19 Singhpura Loam 19.74 62.6028 17.6572 human brain. ANN is a parallel knowledge processing 20 Durala Sandy Loam 39.13 46.2612 14.6088 system containing a set of neurons in layers. In this study, the ANN model includes three layers input, hidden, and output layers. The input layer receives 2.2. Dataset the data, the hidden layer processes them, and the The whole dataset containing 340 observations from output layer shows the model’s target resultant. Each field infiltration experiments was separated into two input into a neuron in a hidden and output layer is groups: training and testing. Training data involves multiplied by a corresponding interconnection weight 70% of the total data chosen randomly from the (X ) and total by a threshold steady value called bias ij whole data set, while testing data consists of the (y Þ. The addition and multiplication functions in Table 3. Features of the data set. Training data Testing data Parameter Unit Lower Higher. mean Std. deviation Lower Higher. mean Std. deviation Time (t) min. 1.00 17.00 9.08 4.98 1.00 17.00 8.80 4.75 Sand(S) (%) 19.74 78.73 41.32 13.81 19.74 78.73 40.88 13.84 Clay(C) (%) 7.44 62.60 33.58 13.35 7.44 62.60 34.87 14.38 Silt (Si) (%) 4.81 44.70 25.09 10.99 4.81 44.70 24.25 10.96 bulk density (ρ) gm/cc 1.39 1.90 1.67 0.13 1.39 1.90 1.66 0.13 moisture content (MC) (%) 1.49 14.19 7.72 3.14 1.49 14.19 7.72 3.07 Cumulative Infiltration (F(t)) mm 0.63 25.90 6.95 4.86 0.94 23.89 6.82 4.55 112 B. SINGH ET AL. 2.4.2. M5P model (M5P) M5P tree, proposed by Quinlan (1992), is a decision tree learner for regression problems. This tree algo- rithm assigns linear regression functions at the term- inal nodes. It fits a multivariate linear regression model to each subspace by classifying or dividing the whole data space into several subspaces. The M5 tree model develops conditional linear models for the non- linear behavior of the data set. The information about the splitting criteria for the M5 tree model is gained on the source of the assess of error at every node. The error is calculated by the standard deviation of the class values that arrive at a node. The standard deviation reduction (SDR) is defined as follow: jZj SDR ¼ sdðZÞ