Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Comparative analysis of artificial intelligence techniques for the prediction of infiltration process

Comparative analysis of artificial intelligence techniques for the prediction of infiltration... GEOLOGY, ECOLOGY, AND LANDSCAPES 2021, VOL. 5, NO. 2, 109–118 INWASCON https://doi.org/10.1080/24749508.2020.1833641 RESEARCH ARTICLE Comparative analysis of artificial intelligence techniques for the prediction of infiltration process a b c d Balraj Singh , Parveen Sihag , Abbas Parsaie and Anastasia Angelaki a b Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, India; Civil Engineering Department, Shoolini c d University, Solan, India; Hydro-Structure Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran; Department of Agriculture Crop Production and Rural Environment, University of Thessaly, Volos, Greece ABSTRACT ARTICLE HISTORY Received 21 January 2020 Knowledge of the infiltration process is beneficial in designing and planning of irrigation Accepted 4 October 2020 networks, soil erosion, hydrologic design, and watershed management. In this study, the infiltration process was analyzed using predictive models of artificial neural network (ANN), KEYWORDS multi-linear regression (MLR), Random Forest regression (RF), M5P tree, and their performances Infiltration process; artificial were compared with the empirical model: Kostiakov model. Field experimental data was intelligence techniques; implemented for training and testing the above models, and their outcomes were assessed kostiakov model; nash- with the help of suitable performance assessment parameters. These models were assessed sutcliff efficiency; multi- linear regression using a field dataset containing 340 observations, out of which 70% were used for the training purpose and the remaining for the testing. The RF-based models perform better than other models with Nash-Sutcliffe model efficiency (NSE) equal to 0.9963 and 0.9904 for the training and testing stages, correspondingly. ANN, MLR, and M5P model also give a good prediction performance, but the Kostiakov model’s performance is inferior. Sensitivity investigation suggests that the parameters, cumulative time, and moisture content in the soil are the most influential parameters for assessing the cumulative infiltration of soil. 1. Introduction conditions. The experimental measurement of the infiltration process is laborious, tedious, and time- Infiltration is water movement into the subsurface consuming (Vand et al., 2018). Assessment of the from surface sources, for instance, snowfall, irriga- infiltration process is very complex due to spatial and tion, precipitation, etc. The soil-water relationship temporal variation (Pandey & Pandey, 2018). plays a crucial role in modeling towards water man- Numerous studies (Mishra et al., 2003; Singh et al., agement, control of droughts and floods, rainfall- 2018) proposed implementing conventional infiltra - runoff, evaluations of risk, design, scheduling of irri- tion models as a substitute for experimental observa- gation system, development of water resources, and tion. The use of any specific model needs complete drainage design, etc. Various physical properties of knowledge of boundary conditions and assumptions the soil are affecting the infiltration characteristics. of that model. Several soil water scientists introduced Soil texture, soil moisture, and density have consid- several infiltration models such as Kostiakov, Horton, erable influence on the infiltration process (Angelaki Philip, Holton, Green-Ampt, Novel, Modified et al., 2013). The texture of the soil is also one of the Kostiakov, etc. for estimating the infiltration most crucial factors which influence the infiltration (Richards,1931; Philips,1957; Singh & Yu,1990; process. Water accessibility in the soil depends on the Mishra et al.2003; Sihag et al.2017a). Mishra et al. soil’s water-holding ability, which is affected by the (2003) divided these models into three groups, texture and structure of the soil (Al-Azawi, 1985). Physical models, Semi-empirical and empirical mod- The infiltration rate is high in unsaturated soil. It els. Most of these models are based on the basic reduces gradually and finally reaches to the constant assumption of homogeneous water absorption, infiltration rate. Knowledge of infiltration is neces- pounding head, and constant infiltration rate. These sary for any valuable and durable projects of water hypotheses hardly ever found under real field condi- resources management (Sihag et al., 2018a). The irri- tions, which may lead to the inaccurate prediction of gation system’s design and scheduling rely on the the infiltration process. soil’s infiltration because it affects various design Some researchers used an alternative method for considerations of agriculture and canal systems. estimating the infiltration process. They use several Infiltration characteristics vary at the scale due to soft computing based infiltration models based on variation in texture and type of the soil and other soil CONTACT Balraj Singh balrajzinder@gmail.com Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, Haryana, India © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of the International Water, Air & Soil Conservation Society(INWASCON). This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 110 B. SINGH ET AL. Figure 1. The map of the selected study area. soil properties. Several successful applications of soft studies e.g., Sattari et al. (2018), Sattari et al. computing based infiltration prediction reported in (2013), Pal et al. (2012), and Pal and Deswal the literature such as (Tiwari et al., 2017; Singh (2009). et al.,2017; Sihag et al.,2017b)concluded that soil phy- RF, M5P, and ANN-based models extract sical properties and elapsed time are effectively knowledge from data itself. The best performing selected to estimate infiltration process with higher model identifies using appropriate performance precision. assessment parameters such as Nash- Sutcliffe model efficiency, root mean square error, mean In recent years, soft computing approaches such square error, and correlation coefficient. In this as Random Forest, M5P, SVM, GEP, Gaussian study, models are developed to predict the cumu- process, ANFIS, and many more approaches have lative infiltration of soil and compare the perfor- been successfully implemented in water resources mances of soft computing-based models with problems (Azamathulla et al.,2016; Parsaie & empirical models (Kostiakov model and multi- Haghiabi,2015,2014; Parsaie et al.,2017; Sihag, linear regression (MLR)). 2018; Sihag et al.,2017c; Tiwari et al.,2018). This paper uses a model based on RF, as proposed by Breiman (2001). It is a powerful tool for nonlinear 2. Materials and methodology regression and classification. Examples using the RF capability include infiltration process modeling 2.1. Study area (Singh et al.,2017,2018). ANN is working on the Kurukshetra district of Haryana state lies in the principle of nerve cells of the brain. ANN has north-east part of the State, India, and is bounded widely applied in the field of engineering and by North latitudes 29°53ʹ00” and 30°15ʹ02” and East observed that performs better than conventional longitudes 76°26ʹ27” and 77°07ʹ57”. Thanesar Tehsil models e.g., Sihag (2018), Tiwari and Sihag (2018), of Kurukshetra district is selected as a study area. Haghiabi et al. (2017), and Ghorbani et al. (2016). The total area of the Kurukshetra district is M5P tree, initially proposed by Quinlan (1992), is 1530 Km . The site map of the study area is given a decision tree learner for regression problems. in Figure 1. The study area (Thanesar) is a division M5P tree-based model involves linear regression of the Ghaggar basin. A total of 20 different sites functions at the terminal nodes and fits were selected for experimentation in the study area. a multivariate linear regression model to each sub- The coordinates of all sampling sites are scheduled space by classifying or separating the entire data inTable 1. The texture of the soil is scheduled space into multiple subspaces. M5P has been suc- inTable 2. cessfully used in the water resources related GEOLOGY, ECOLOGY, AND LANDSCAPES 111 Table 1. The details of the coordinates of the sampling sites. remaining 30% of the entire data. The features of Site No. Sites Latitude Longitude training and testing data sets are represented in 1 Dayalpur 29.939648 N 76.814545E Table 3. Time, sand, clay, silt, bulk density, and moist- 2 Samshipur 29.925980 N 76.803795E ure content are input parameters, and soil’s cumula- 3 Kirmach (SKS) 29.911368 N 76.794275E 4 Alampur 29.938222 N 76.824080E tive infiltration is the target. 5 Sanheri Khalsa 29.918557 N 76.826591E 6 Mirzapur 29.950163 N 76.781358E 7 Khanpur Roran 29.939504 N 76.757209E 8 Barna 29.924569 N 76.733358E 2.3. Observation of cumulative infiltration 9 Pindarsi 29.919078 N 76.702227E 10 Kamoda 29.936836 N 76.736818E Experiments were performed to measure the cumula- 11 Lohar Majra 29.958742 N 76.727137E tive infiltration of soil in the study area’s locations 12 Jyotisar 29.960166 N 76.760195E 13 Narkatari 29.962200 N 76.797872E using a mini-disk infiltrometer (Decagon Devices 14 Kurukshetra University 29.95.5052 N 76.815767E Inc., Devices, 2014). Two chambers are available in 15 Thim Park 29.967055 N 76.832005E 16 Darra Khera 29.981300 N 76.822550E a mini-disk infiltrometer. One is a water reservoir, and 17 Bhiwani Khera 29.994305 N 76.826474E the other is a bubble. Both are connected via 18 Bahadur Pura 30.008150 N 76.834262E a Mariotte tube. This tube is used to provide a steady 19 Hansala 30.011900 N 76.811639E 20 Durala 30.025939 N 76.809048E water pressure head of 0.05 to 0.7 kPa. The instru- ment’s bottom part contains a porous sintered steel disk having a diameter of 4.5 cm and a thickness of Table 2. The texture of the soil. 3 mm. The water is filled in both chambers and placed Site Sand Clay No. Location Texture (%) (%) Silt (%) on the soil’s flat surface (Figure 2), ensuing in water 1 Dayalpur Loamy Sand 78.73 7.4445 13.8255 moving into the soil. During the measurement, the 2 Samshipur Clay 39.84 55.3472 4.8128 quantity of the water in the reservoir chamber was 3 Kirmach (SKS) Clay 37.14 43.3734 19.4866 4 Alampur Sandy clay 47.5 25.2 27.3 recorded at specific intervals. The flowchart diagram Loam for the current investigation is represented in Figure 3. 5 Sanheri Sandy clay 52.11 24.9028 22.9872 Figure 2 represented the flowchart of the investigation. Khalsa Loam 6 Mirzapur Clay 26.63 41.8209 31.5491 The first step was designing the experiments followed 7 Khanpur Clay loam 32.94 29.5064 37.5536 by the collection & analysis of data, comparison of the Roran 8 Barna Clay loam 31.52 35.7133 32.7667 Artificial Intelligence techniques and empirical mod- 9 Pindarsi Sandy clay 47.6 27.248 25.152 els, the best-fitted model for prediction of the infiltra - Loam 10 Kamoda Loam 42.85 24.003 33.147 tion process, and conclusion. 11 Lohar Majra Clay loam 24.6 39.962 35.438 12 Jyotisar Sandy clay 52.71 34.5217 12.7683 Loam 13 Narkatari Clay loam 22.93 32.3694 44.7006 2.4. Modeling approaches 14 KUK Clay 52.74 19.85 27.41 15 Thim Park clay 36.7 26.586 36.714 2.4.1. Artificial neural networks (ANN) 16 Dara kheda Sandy clay 35.31 59.5148 5.1752 ANN is a data mining approach, generally implemen- Loam 17 bhiwani Sandy clay 59.58 30.7192 9.7008 ted in several engineering fields. The idea of the ANN kheds Loam model generation is inspired by the nerve cell of the 18 bhaderpura Clay 50.78 23.6256 25.5994 19 Singhpura Loam 19.74 62.6028 17.6572 human brain. ANN is a parallel knowledge processing 20 Durala Sandy Loam 39.13 46.2612 14.6088 system containing a set of neurons in layers. In this study, the ANN model includes three layers input, hidden, and output layers. The input layer receives 2.2. Dataset the data, the hidden layer processes them, and the The whole dataset containing 340 observations from output layer shows the model’s target resultant. Each field infiltration experiments was separated into two input into a neuron in a hidden and output layer is groups: training and testing. Training data involves multiplied by a corresponding interconnection weight 70% of the total data chosen randomly from the (X ) and total by a threshold steady value called bias ij whole data set, while testing data consists of the (y Þ. The addition and multiplication functions in Table 3. Features of the data set. Training data Testing data Parameter Unit Lower Higher. mean Std. deviation Lower Higher. mean Std. deviation Time (t) min. 1.00 17.00 9.08 4.98 1.00 17.00 8.80 4.75 Sand(S) (%) 19.74 78.73 41.32 13.81 19.74 78.73 40.88 13.84 Clay(C) (%) 7.44 62.60 33.58 13.35 7.44 62.60 34.87 14.38 Silt (Si) (%) 4.81 44.70 25.09 10.99 4.81 44.70 24.25 10.96 bulk density (ρ) gm/cc 1.39 1.90 1.67 0.13 1.39 1.90 1.66 0.13 moisture content (MC) (%) 1.49 14.19 7.72 3.14 1.49 14.19 7.72 3.07 Cumulative Infiltration (F(t)) mm 0.63 25.90 6.95 4.86 0.94 23.89 6.82 4.55 112 B. SINGH ET AL. 2.4.2. M5P model (M5P) M5P tree, proposed by Quinlan (1992), is a decision tree learner for regression problems. This tree algo- rithm assigns linear regression functions at the term- inal nodes. It fits a multivariate linear regression model to each subspace by classifying or dividing the whole data space into several subspaces. The M5 tree model develops conditional linear models for the non- linear behavior of the data set. The information about the splitting criteria for the M5 tree model is gained on the source of the assess of error at every node. The error is calculated by the standard deviation of the class values that arrive at a node. The standard deviation reduction (SDR) is defined as follow: jZj SDR ¼ sdðZÞ sdðZÞ (2) jZj Zshows the set of occurrences that arrive at the node; th Z shows the subset of instances with the i target of the possible set, and sdshows the standard deviation. Figure 2. Mini disk infiltrometer. The splitting practice ends if the target values of all instances that arrive at the node differ very minutely. every neuron are shown in equation 1. P is output achieved by the activation function to generate an 2.4.3. Random forest regression (RF) output for unit j. The complete information about Random forest, introduced by Breiman (2001), is ANN is provided by Haykin (1994). a classification and regression process, comprises a gathering of regression trees trained using various P ¼ X � y (1) j ij bootstrap samples (bagging) of the training data. Each tree acts as a regression function on its own, and the Figure 3. Overview of the investigation. GEOLOGY, ECOLOGY, AND LANDSCAPES 113 final target is considered as the average of the indivi- Table 4. The details of the primary parameters. Machine learning dual tree outputs (Adusumilli et al., 2013).In the case approach Primary parameters of bagging, the training set contains about 67% of data RF k = 10, m = 1, Iterations = 100 from the actual training set; thus, about one-third of ANN(6-5-1) Neurons = 5, learning rate = 0.2, momentum = 0.1, I = 1500 the data are left out from each tree developed (Singh M5P m = 4 et al., 2017). These left-out data training data, termed as out-of-bag (out of the bootstrap sampling), were used to estimate prediction error and variable impor- n 2 tance. The quantity of trees to be grown (k) in the ðH FÞ i¼1 NSE ¼ 1 (7) forest and the number of features or variables selected n ðH HÞ i¼1 (m) at each node to generate a tree are the two stan- dard primary parameters necessary for random forest regression (Breiman, 2001). MSE ¼ ð ðH FÞ (8) i¼1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2.5. Empirical models X 1 n RMSE ¼ ð ðH FÞ (9) i¼1 Kostiakov and Multi Linear regression models are empirical models. The least-square techniques were where H is the actual values, F is the estimated/fore- used to drive regression coefficients of the Kostiakov casted values,H is the mean of actual values, and n is and MLR models with the training data set. the number of observations. 2.5.1. Multi-linear regression (MLR) 2.7. Implementation of machine learning The following relationship’s general form of the multi- methods linear regression model is considered to develop a nonlinear regression model. Four standard statistical measures: R, MSE, RMSE, b b b b b b 1 2 3 4 5 6 and NSE, were selected as performance evaluation FðtÞ ¼ at S C Si ρ MC (3) parameters judge the accuracy of the machine learning where FðtÞ is the dependent variable representing models and Kostiakov model. Several manual trials cumulative infiltration of soil; t,S, C, Si, ρ, andMC were carried out to discover the optimum value of are regarded as explanatory variables, a is the constant, the primary parameters. Higher values of R, NSE, and the estimate of parameters (regression and Lower values of MSE, RMSE indicate that the coefficients) b ,b , b ,b ,b and b is found by minimiz- 1 2 3 4 5 6 models’ better prediction accuracy. The number of ing the sum of squares of error in prediction based on trees to be grown (k) in the forest and the number of least squares. Based on above equation, the following features or variables selected (m) at every node to relationship is developed from the training data set: generate a tree are the two standard primary para- 0:6867 0:4358 0:3779 0:0865 1:458 0:4447 meters essential for random forest regression. In FðtÞ ¼ 1:5646t S C Si ρ MC M5P, calibration of models was done utilizing chan- (4) ging the value of no. of instances allowed at each node (m). In the ANN number of the hidden layer, the number of neurons, iterations, learning rate, and 2.5.2. Kostiakov model momentum are the primary parameters. The selected 0:6909 FðtÞ ¼ 1:57t (5) primary parameters of the modeling methods are pre- sented in Table 4. 2.6. Comparing parameters 3. Results and discussion Correlation coefficient (R), Nash-Sutcliffe model effi - Performance of Empirical models: Table 5 presents the ciency (NSE), mean square error (MSE) and root outcomes of estimated F(t) for the Kostiakov model mean square error (RMSE) and statistics parameters and MLR model based on R, MSE, RMSE, and NSE. were implemented to assess the precision of RF, M5P, Based on MSE and RMSE, MLR models show lesser ANN, MLR and Kostiakov model. The R, MSE, RMSE error values than those achieved from the Kostiakov and NSE are computed as: P P P model. R, MSE, RMSE, and NSE are the most common n HF ð HÞð FÞ assessment criteria for prediction or forecast models, R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P P P 2 2 2 2 showed that the MLR model is more accurate than the nð H Þ ð HÞ nð F Þ ð FÞ Kostiakov model. It is noticeable that MLR has higher (6) NSE and R than the Kostiakov model Since NSE = 0.75 114 B. SINGH ET AL. Table 5. The performance of ANN, M5P, RF, MLR, and Kostiakov models. Training data set Testing data set Approaches R MSE RMSE NSE R MSE RMSE NSE ANN 0.9926 0.5221 0.7226 0.9778 0.9911 0.5483 0.7405 0.9732 M5P 0.9445 2.7125 1.6470 0.8848 0.9282 2.9190 1.7085 0.8573 RF 0.9985 0.0874 0.2957 0.9963 0.9953 0.1969 0.4437 0.9904 MLR 0.9003 4.4840 2.1176 0.8095 0.8698 5.1131 2.2612 0.7500 Kostiakov Model 0.5844 15.5020 3.9373 0.3415 0.5757 13.7021 3.7016 0.3301 and 0.3301 & R = 0.8698, and 0.5757 were found for MSE, RMSE, and NSE indicate that RF is more accu- MLR and Kostiakov models respectively. The scatter rate than ANN and M5P models in predicting the plot among the actual and predicted F(t) value is cumulative infiltration of soil. represented in Figure 4 for the MLR and Kostiakov The Scatter plot among the actual and predicted model for the training and testing stages. Figure 4 F(t) values is represented in Figures 5, 6, and 7 for illustrates the closer comparison of F(t) between both M5P, ANN, and RF for both training and testing models, where it indicates that the MLR model’s per- stage correspondingly. The Figure shows that the formance is slightly better than Kostiakov models for data distribution around the agreement line shows every data. that the actual values are well correlated with predicted values of cumulative infiltration F(t). It is evident that RF is more correlated with actual 3.1. Performance of soft computing based models data than ANN and M5P since R = 0.9953, 0.9282, and 0.9911 were obtained for RF, M5P, and ANN The preparation of M5P, ANN, and RF-based models model with the testing dataset, respectively. is a trial and error process. Numbers of manual trials Overall, assessing Figures 5, 6, and 7 shows were carried out to discover the optimum value of a closer comparison of F(t) among RF, M5P, and primary parameters. The optimum values of user- ANN-based models, where it clearly shows a good defined parameters of M5P, ANN, and RF are listed performance of RF, M5P, and ANN for every data in Table 4. Outcomes of the accuracy of predicted F(t) point. Figures indicate that RF, M5P, and ANN for M5P, ANN, and RF-based on R, MSE, RMSE, and models are suitable for Predicting the values NSE are reported in Table 5. For the best model, of F(t). a Higher value of R and NSE and lower MSE and RMSE were considered. Cementing the outcomes, R, Figure 4. Scatter plot of empirical models for both training and testing stages. GEOLOGY, ECOLOGY, AND LANDSCAPES 115 Figure 5. The performance of the M5P model (training and testing stages). Figure 6. The performance of the ANN model (training and testing stages). 3.2. Comparison of empirical and soft computing empirical models, for both training and testing stages. based models A performance plot is also shown in Figure 8 using the testing data set for the Kostiakov model, MLR, M5P, The performance of predicted F(t) using RF, M5P, and ANN, and RF model. The comparison of statistical para- ANN compared with the predicted F(t) using empirical meters obtained from soft computing based models and models is assessed. The same data set was used to assess empirical models are listed in Table 5. Overall, RF shows the empirical model selected for RF, M5P, and ANN- a better prediction method, having a higher NSE of based models. Figure 8 shows the scatter plot and per- 0.9904. Predicted values of F(t) using RF (as represented formance pot using soft computing based models, 116 B. SINGH ET AL. Figure 7. The performance of the RF model (training and testing stages). in Figure 8) lies more closely to the perfect fit line and popular RF, M5P, and ANN, were utilized as soft follow the same path, which is followed by actual values computing models. The prediction of RF’s cumula- compared to those estimated using empirical models. tive infiltration values was more superior to those from ANN, M5P, MLR, and Kostiakov models. At the same time, the ANN model also showed 3.4. Sensitivity investigation improvement compared to M5P and empirical A sensitivity investigation was done to find the mainly models. Sensitivity results conclude that soil sam- effecting input factor in F(t) prediction. Table 5 sug- ples’ elapse time and moisture content are the most gests that RF is more suitable than other soft comput- significant factors when the RF-based modeling ing and empirical models, so the RF method was used. method is implemented for the prediction of cumu- Seven sets of training data were developed, one includ- lative infiltration of soil for a given dataset. This ing all input parameters and six were developed by investigation enhance the use and capabilities of eliminating one input factor at a time, and results were the artificial intelligence techniques and also train listed in terms of R as well as RMSE with testing data a general model to predict the infiltration process set. Outcomes from Table 6 conclude that elapse time for successfully implemented in the other study and moisture content were the most significant factors area. for predicting soil’s cumulative infiltration. Disclosure statement 4. Conclusion No potential conflict of interest was reported by the authors. Infiltration plays a crucial role in rainfall-runoff modeling, design, and scheduling of irrigation sys- tems, etc. The performance of soft computing- ORCID based models in predicting soil’s cumulative infil - tration over varying sand, silt, clay, density, and Balraj Singh http://orcid.org/0000-0002-0381-4363 Parveen Sihag http://orcid.org/0000-0002-7761-0603 moisture content was investigated. The three most GEOLOGY, ECOLOGY, AND LANDSCAPES 117 Figure 8. The performance of soft computing models and empirical models. Table 6. Sensitivity investigation using RF. Angelaki, A., Sakellariou-Makrantonaki, M., & Tzimopoulos, C. (2013). Theoretical and experimental RF Input factor research of cumulative infiltration. Transport in Porous Input arrangement eliminated R RMSE (mm) Media, 100(2), 247–257. https://doi.org/10.1007/s11242- t, S, C, Si, ρ, Mc 0.9953 0.4437 013-0214-2 S, C, Si, ρ, Mc t 0.7619 2.9312 t, C, Si, ρ, Mc S 0.9956 0.4323 Azamathulla, H. M., Haghiabi, A. H., & Parsaie, A. (2016). t, S, Si, ρ, Mc C 0.9952 0.4506 Prediction of side weir discharge coefficient by support t, S, C,ρ, Mc Si 0.9957 0.4259 vector machine technique. Water Science and Technology: t, S, C, Si, Mc ρ 0.995 0.4582 Water Supply, 16(4), 1002–1016, doi: 10.2166/ t, S, C, Si, ρ Mc 0.8063 2.758 ws.2016.014. Breiman, L. (2001). Random forests. Machine Learning, 45 (1), 5–32. https://doi.org/10.1023/A:1010933404324 Devices, D. (2014). Mini disk infiltrometer user’s manual. Decagon Devices. http://www.decagon.com/products/ hydrology/hydraulic-conductivity/mini-disk-portable- References tension-infiltrometer Adusumilli, S., Bhatt, D., Wang, H., Bhattacharya, P., & Ghorbani, M. A., Zadeh, H. A., Isazadeh, M., & Terzi, O. Devabhaktuni, V. (2013). A low-cost INS/GPS integra- (2016). A comparative study of artificial neural net- tion methodology based on random forest regression. work (MLP, RBF) and support vector machine models Expert Systems with Applications, 40(11), 4653–4659. for river flow prediction. Environmental Earth https://doi.org/10.1016/j.eswa.2013.02.002 Sciences, 75(6), 476. https://doi.org/10.1007/s12665- Al-Azawi, S. A. (1985). Experimental evaluation of infiltra - 015-5096-x tion models. Journal of Hydrology (New Zealand), 24(2), Haghiabi, A. H., Azamathulla, H. M., & Parsaie, A. (2017). 77–88. https://www.jstor.org/stable/43944562?seq=1. Prediction of head loss on cascade weir using ANN and 118 B. SINGH ET AL. SVM. ISH Journal of Hydraulic Engineering, 23(1), sodium adsorption ratio. Journal of AI and Data 102–110. https://doi.org/10.1080/09715010.2016.1241724 Mining, 6(1), 69–78. https://doi.org/10.22044/JADM. Haykin, S. (1994). Neural networks, a comprehensive foun- 2017.5540.1663. dation (No. BOOK). Macmilan. Sihag, P. (2018). Prediction of unsaturated hydraulic con- Mishra, S. K., Tyagi, J. V., & Singh, V. P. (2003). ductivity using fuzzy logic and artificial neural Comparison of infiltration models. Hydrological network. Modeling Earth Systems and Environment, 4 Processes, 17(13), 2629–2652. https://doi.org/10.1002/ (1), 189-198. https://doi.org/10.1007/s40808-018-0434-0 hyp.1257 Sihag, P., Singh, B., Sepah Vand, A., & Mehdipour, V. Pal, M., & Deswal, S. (2009). M5 model tree based modelling (2018a). Modeling the infiltration process with soft com- of reference evapotranspiration. Hydrological Processes: puting techniques. ISH Journal of Hydraulic Engineering, An International Journal, 23(10), 1437–1443. https://doi. 1–15. https://doi.org/10.1080/09715010.2018.1439776 org/10.1002/hyp.7266 Sihag, P., Tiwari, N. K., & Ranjan, S. (2017a). Prediction and Pal, M., Singh, N. K., & Tiwari, N. K. (2012). M5 model tree inter-comparison of infiltration models. Water Science, for pier scour prediction using field dataset. KSCE Journal 31(1), 34–43. https://doi.org/10.1016/j.wsj.2017.03.001 of Civil Engineering, 16(6), 1079–1084. https://doi.org/10. Sihag, P., Tiwari, N. K., & Ranjan, S. (2017b). Modelling of 1007/s12205-012-1472-1 infiltration of sandy soil using gaussian process regression. Pandey, P. K., & Pandey, V. (2019). Estimation of infiltra - Modeling Earth Systems and Environment, 3(3), 1091–1100. tion rate from readily available soil properties (RASPs) in https://doi.org/10.1007/s40808-017-0357-1 fallow cultivated land. Sustainable Water Resources Sihag, P., Tiwari, N. K., & Ranjan, S. (2017c). Prediction of Management, 5(2), 921-934.https://doi.org/10.1007/ unsaturated hydraulic conductivity using adaptive s40899-018-0268-y neuro-fuzzy inference system (ANFIS). ISH Journal of Parsaie, A., & Haghiabi, A. (2014). Predicting the side weir Hydraulic Engineering, 1–11. https://doi.org/10.1080/ discharge coefficient using the optimized neural network 09715010.2017.1381861 by genetic algorithm. Scientific Journal of Pure and Singh, B., Sihag, P., & Singh, K. (2017). Modelling of impact Applied Sciences, 3(3), 103–112. https://doi.org/10. of water quality on infiltration rate of soil by random 14196/sjpas.v3i3.1195. forest regression. Modeling Earth Systems and Parsaie, A., & Haghiabi, A. (2015). The effect of predicting Environment, 3(3), 999–1004. https://doi.org/10.1007/ discharge coefficient by neural network on increasing the s40808-017-0347-3 numerical modeling accuracy of flow over side weir. Singh, B., Sihag, P., & Singh, K. (2018). Comparison of Water Resources Management, 29(4), 973–985. https:// infiltration models in NIT Kurukshetra campus. Applied doi.org/10.1007/s11269-014-0827-4 Water Science, 8(2), 63. https://doi.org/10.1007/s13201- Parsaie, A., Haghiabi, A. H., Saneie, M., & Torabi, H. (2017). 018-0708-8 Predication of discharge coefficient of cylindrical Singh, V. P., & Yu, F. X. (1990). Derivation of infiltration weir-gate using adaptive neuro fuzzy inference systems equation using systems approach. Journal of Irrigation (ANFIS). Frontiers of Structural and Civil Engineering, 11 and Drainage Engineering, 116(6), 837–858. https://doi. (1), 111–122. https://doi.org/10.1007/s11709-016-0354-x org/10.1061/(ASCE)0733-9437(1990)116:6(837) Philips, J. R. (1957). The theory of infiltration: The infiltra - Tiwari, N. K., & Sihag, P. (2018). Prediction of oxygen tion equation and its solution. Soil Science, 83(5), transfer at modified Parshall flumes using regression 345–357. https://doi.org/10.1097/00010694-195705000- models. ISH Journal of Hydraulic Engineering, 1–12. 00002 https://doi.org/10.1080/09715010.2018.1473058 Quinlan, J. R. (1992) Learning with continuous classes. 5th Tiwari, N. K., Sihag, P., Kumar, S., & Ranjan, S. (2018). Australian joint conference on artificial intelligence (vol. Prediction of trapping efficiency of vortex tube ejector. 92, pp. 343–348). ISH Journal of Hydraulic Engineering, 1–9. https://doi. Richards, L. A. (1931). (1931) Capillary conduction of org/10.1080/09715010.2018.1441752 liquids through porous mediums. Physics, 1(5), Tiwari, N. K., Sihag, P., & Ranjan, S. (2017). Modeling of 318–333. https://doi.org/10.1063/1.1745010 infiltration of soil using adaptive neuro-fuzzy inference Sattari, M. T., Pal, M., Apaydin, H., & Ozturk, F. (2013). M5 system (ANFIS). Journal of Engineering & Technology model tree application in daily river flow forecasting Education, 11(1), 13–21. in Sohu Stream, Turkey. Water Resources, 40(3), Vand, A. S., Sihag, P., Singh, B., & Zand, M. (2018). 233–242. https://doi.org/10.1134/S0097807813030123 Comparative evaluation of infiltration models. KSCE Sattari, M. T., Pal, M., Mirabbasi, R., & Abraham, J. (2018). Journal of Civil Engineering, 22(10), 4173-4184. https:// Ensemble of M5 model tree based modelling of doi.org/10.1007/s12205-018-1347-1 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geology Ecology and Landscapes Taylor & Francis

Comparative analysis of artificial intelligence techniques for the prediction of infiltration process

Loading next page...
 
/lp/taylor-francis/comparative-analysis-of-artificial-intelligence-techniques-for-the-G0M9CqIWUI

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Taylor & Francis
Copyright
© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of the International Water, Air & Soil Conservation Society(INWASCON).
ISSN
2474-9508
DOI
10.1080/24749508.2020.1833641
Publisher site
See Article on Publisher Site

Abstract

GEOLOGY, ECOLOGY, AND LANDSCAPES 2021, VOL. 5, NO. 2, 109–118 INWASCON https://doi.org/10.1080/24749508.2020.1833641 RESEARCH ARTICLE Comparative analysis of artificial intelligence techniques for the prediction of infiltration process a b c d Balraj Singh , Parveen Sihag , Abbas Parsaie and Anastasia Angelaki a b Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, India; Civil Engineering Department, Shoolini c d University, Solan, India; Hydro-Structure Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran; Department of Agriculture Crop Production and Rural Environment, University of Thessaly, Volos, Greece ABSTRACT ARTICLE HISTORY Received 21 January 2020 Knowledge of the infiltration process is beneficial in designing and planning of irrigation Accepted 4 October 2020 networks, soil erosion, hydrologic design, and watershed management. In this study, the infiltration process was analyzed using predictive models of artificial neural network (ANN), KEYWORDS multi-linear regression (MLR), Random Forest regression (RF), M5P tree, and their performances Infiltration process; artificial were compared with the empirical model: Kostiakov model. Field experimental data was intelligence techniques; implemented for training and testing the above models, and their outcomes were assessed kostiakov model; nash- with the help of suitable performance assessment parameters. These models were assessed sutcliff efficiency; multi- linear regression using a field dataset containing 340 observations, out of which 70% were used for the training purpose and the remaining for the testing. The RF-based models perform better than other models with Nash-Sutcliffe model efficiency (NSE) equal to 0.9963 and 0.9904 for the training and testing stages, correspondingly. ANN, MLR, and M5P model also give a good prediction performance, but the Kostiakov model’s performance is inferior. Sensitivity investigation suggests that the parameters, cumulative time, and moisture content in the soil are the most influential parameters for assessing the cumulative infiltration of soil. 1. Introduction conditions. The experimental measurement of the infiltration process is laborious, tedious, and time- Infiltration is water movement into the subsurface consuming (Vand et al., 2018). Assessment of the from surface sources, for instance, snowfall, irriga- infiltration process is very complex due to spatial and tion, precipitation, etc. The soil-water relationship temporal variation (Pandey & Pandey, 2018). plays a crucial role in modeling towards water man- Numerous studies (Mishra et al., 2003; Singh et al., agement, control of droughts and floods, rainfall- 2018) proposed implementing conventional infiltra - runoff, evaluations of risk, design, scheduling of irri- tion models as a substitute for experimental observa- gation system, development of water resources, and tion. The use of any specific model needs complete drainage design, etc. Various physical properties of knowledge of boundary conditions and assumptions the soil are affecting the infiltration characteristics. of that model. Several soil water scientists introduced Soil texture, soil moisture, and density have consid- several infiltration models such as Kostiakov, Horton, erable influence on the infiltration process (Angelaki Philip, Holton, Green-Ampt, Novel, Modified et al., 2013). The texture of the soil is also one of the Kostiakov, etc. for estimating the infiltration most crucial factors which influence the infiltration (Richards,1931; Philips,1957; Singh & Yu,1990; process. Water accessibility in the soil depends on the Mishra et al.2003; Sihag et al.2017a). Mishra et al. soil’s water-holding ability, which is affected by the (2003) divided these models into three groups, texture and structure of the soil (Al-Azawi, 1985). Physical models, Semi-empirical and empirical mod- The infiltration rate is high in unsaturated soil. It els. Most of these models are based on the basic reduces gradually and finally reaches to the constant assumption of homogeneous water absorption, infiltration rate. Knowledge of infiltration is neces- pounding head, and constant infiltration rate. These sary for any valuable and durable projects of water hypotheses hardly ever found under real field condi- resources management (Sihag et al., 2018a). The irri- tions, which may lead to the inaccurate prediction of gation system’s design and scheduling rely on the the infiltration process. soil’s infiltration because it affects various design Some researchers used an alternative method for considerations of agriculture and canal systems. estimating the infiltration process. They use several Infiltration characteristics vary at the scale due to soft computing based infiltration models based on variation in texture and type of the soil and other soil CONTACT Balraj Singh balrajzinder@gmail.com Civil Engineering Department, Panipat Institute of Engineering and Technology, Panipat, Haryana, India © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of the International Water, Air & Soil Conservation Society(INWASCON). This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 110 B. SINGH ET AL. Figure 1. The map of the selected study area. soil properties. Several successful applications of soft studies e.g., Sattari et al. (2018), Sattari et al. computing based infiltration prediction reported in (2013), Pal et al. (2012), and Pal and Deswal the literature such as (Tiwari et al., 2017; Singh (2009). et al.,2017; Sihag et al.,2017b)concluded that soil phy- RF, M5P, and ANN-based models extract sical properties and elapsed time are effectively knowledge from data itself. The best performing selected to estimate infiltration process with higher model identifies using appropriate performance precision. assessment parameters such as Nash- Sutcliffe model efficiency, root mean square error, mean In recent years, soft computing approaches such square error, and correlation coefficient. In this as Random Forest, M5P, SVM, GEP, Gaussian study, models are developed to predict the cumu- process, ANFIS, and many more approaches have lative infiltration of soil and compare the perfor- been successfully implemented in water resources mances of soft computing-based models with problems (Azamathulla et al.,2016; Parsaie & empirical models (Kostiakov model and multi- Haghiabi,2015,2014; Parsaie et al.,2017; Sihag, linear regression (MLR)). 2018; Sihag et al.,2017c; Tiwari et al.,2018). This paper uses a model based on RF, as proposed by Breiman (2001). It is a powerful tool for nonlinear 2. Materials and methodology regression and classification. Examples using the RF capability include infiltration process modeling 2.1. Study area (Singh et al.,2017,2018). ANN is working on the Kurukshetra district of Haryana state lies in the principle of nerve cells of the brain. ANN has north-east part of the State, India, and is bounded widely applied in the field of engineering and by North latitudes 29°53ʹ00” and 30°15ʹ02” and East observed that performs better than conventional longitudes 76°26ʹ27” and 77°07ʹ57”. Thanesar Tehsil models e.g., Sihag (2018), Tiwari and Sihag (2018), of Kurukshetra district is selected as a study area. Haghiabi et al. (2017), and Ghorbani et al. (2016). The total area of the Kurukshetra district is M5P tree, initially proposed by Quinlan (1992), is 1530 Km . The site map of the study area is given a decision tree learner for regression problems. in Figure 1. The study area (Thanesar) is a division M5P tree-based model involves linear regression of the Ghaggar basin. A total of 20 different sites functions at the terminal nodes and fits were selected for experimentation in the study area. a multivariate linear regression model to each sub- The coordinates of all sampling sites are scheduled space by classifying or separating the entire data inTable 1. The texture of the soil is scheduled space into multiple subspaces. M5P has been suc- inTable 2. cessfully used in the water resources related GEOLOGY, ECOLOGY, AND LANDSCAPES 111 Table 1. The details of the coordinates of the sampling sites. remaining 30% of the entire data. The features of Site No. Sites Latitude Longitude training and testing data sets are represented in 1 Dayalpur 29.939648 N 76.814545E Table 3. Time, sand, clay, silt, bulk density, and moist- 2 Samshipur 29.925980 N 76.803795E ure content are input parameters, and soil’s cumula- 3 Kirmach (SKS) 29.911368 N 76.794275E 4 Alampur 29.938222 N 76.824080E tive infiltration is the target. 5 Sanheri Khalsa 29.918557 N 76.826591E 6 Mirzapur 29.950163 N 76.781358E 7 Khanpur Roran 29.939504 N 76.757209E 8 Barna 29.924569 N 76.733358E 2.3. Observation of cumulative infiltration 9 Pindarsi 29.919078 N 76.702227E 10 Kamoda 29.936836 N 76.736818E Experiments were performed to measure the cumula- 11 Lohar Majra 29.958742 N 76.727137E tive infiltration of soil in the study area’s locations 12 Jyotisar 29.960166 N 76.760195E 13 Narkatari 29.962200 N 76.797872E using a mini-disk infiltrometer (Decagon Devices 14 Kurukshetra University 29.95.5052 N 76.815767E Inc., Devices, 2014). Two chambers are available in 15 Thim Park 29.967055 N 76.832005E 16 Darra Khera 29.981300 N 76.822550E a mini-disk infiltrometer. One is a water reservoir, and 17 Bhiwani Khera 29.994305 N 76.826474E the other is a bubble. Both are connected via 18 Bahadur Pura 30.008150 N 76.834262E a Mariotte tube. This tube is used to provide a steady 19 Hansala 30.011900 N 76.811639E 20 Durala 30.025939 N 76.809048E water pressure head of 0.05 to 0.7 kPa. The instru- ment’s bottom part contains a porous sintered steel disk having a diameter of 4.5 cm and a thickness of Table 2. The texture of the soil. 3 mm. The water is filled in both chambers and placed Site Sand Clay No. Location Texture (%) (%) Silt (%) on the soil’s flat surface (Figure 2), ensuing in water 1 Dayalpur Loamy Sand 78.73 7.4445 13.8255 moving into the soil. During the measurement, the 2 Samshipur Clay 39.84 55.3472 4.8128 quantity of the water in the reservoir chamber was 3 Kirmach (SKS) Clay 37.14 43.3734 19.4866 4 Alampur Sandy clay 47.5 25.2 27.3 recorded at specific intervals. The flowchart diagram Loam for the current investigation is represented in Figure 3. 5 Sanheri Sandy clay 52.11 24.9028 22.9872 Figure 2 represented the flowchart of the investigation. Khalsa Loam 6 Mirzapur Clay 26.63 41.8209 31.5491 The first step was designing the experiments followed 7 Khanpur Clay loam 32.94 29.5064 37.5536 by the collection & analysis of data, comparison of the Roran 8 Barna Clay loam 31.52 35.7133 32.7667 Artificial Intelligence techniques and empirical mod- 9 Pindarsi Sandy clay 47.6 27.248 25.152 els, the best-fitted model for prediction of the infiltra - Loam 10 Kamoda Loam 42.85 24.003 33.147 tion process, and conclusion. 11 Lohar Majra Clay loam 24.6 39.962 35.438 12 Jyotisar Sandy clay 52.71 34.5217 12.7683 Loam 13 Narkatari Clay loam 22.93 32.3694 44.7006 2.4. Modeling approaches 14 KUK Clay 52.74 19.85 27.41 15 Thim Park clay 36.7 26.586 36.714 2.4.1. Artificial neural networks (ANN) 16 Dara kheda Sandy clay 35.31 59.5148 5.1752 ANN is a data mining approach, generally implemen- Loam 17 bhiwani Sandy clay 59.58 30.7192 9.7008 ted in several engineering fields. The idea of the ANN kheds Loam model generation is inspired by the nerve cell of the 18 bhaderpura Clay 50.78 23.6256 25.5994 19 Singhpura Loam 19.74 62.6028 17.6572 human brain. ANN is a parallel knowledge processing 20 Durala Sandy Loam 39.13 46.2612 14.6088 system containing a set of neurons in layers. In this study, the ANN model includes three layers input, hidden, and output layers. The input layer receives 2.2. Dataset the data, the hidden layer processes them, and the The whole dataset containing 340 observations from output layer shows the model’s target resultant. Each field infiltration experiments was separated into two input into a neuron in a hidden and output layer is groups: training and testing. Training data involves multiplied by a corresponding interconnection weight 70% of the total data chosen randomly from the (X ) and total by a threshold steady value called bias ij whole data set, while testing data consists of the (y Þ. The addition and multiplication functions in Table 3. Features of the data set. Training data Testing data Parameter Unit Lower Higher. mean Std. deviation Lower Higher. mean Std. deviation Time (t) min. 1.00 17.00 9.08 4.98 1.00 17.00 8.80 4.75 Sand(S) (%) 19.74 78.73 41.32 13.81 19.74 78.73 40.88 13.84 Clay(C) (%) 7.44 62.60 33.58 13.35 7.44 62.60 34.87 14.38 Silt (Si) (%) 4.81 44.70 25.09 10.99 4.81 44.70 24.25 10.96 bulk density (ρ) gm/cc 1.39 1.90 1.67 0.13 1.39 1.90 1.66 0.13 moisture content (MC) (%) 1.49 14.19 7.72 3.14 1.49 14.19 7.72 3.07 Cumulative Infiltration (F(t)) mm 0.63 25.90 6.95 4.86 0.94 23.89 6.82 4.55 112 B. SINGH ET AL. 2.4.2. M5P model (M5P) M5P tree, proposed by Quinlan (1992), is a decision tree learner for regression problems. This tree algo- rithm assigns linear regression functions at the term- inal nodes. It fits a multivariate linear regression model to each subspace by classifying or dividing the whole data space into several subspaces. The M5 tree model develops conditional linear models for the non- linear behavior of the data set. The information about the splitting criteria for the M5 tree model is gained on the source of the assess of error at every node. The error is calculated by the standard deviation of the class values that arrive at a node. The standard deviation reduction (SDR) is defined as follow: jZj SDR ¼ sdðZÞ sdðZÞ (2) jZj Zshows the set of occurrences that arrive at the node; th Z shows the subset of instances with the i target of the possible set, and sdshows the standard deviation. Figure 2. Mini disk infiltrometer. The splitting practice ends if the target values of all instances that arrive at the node differ very minutely. every neuron are shown in equation 1. P is output achieved by the activation function to generate an 2.4.3. Random forest regression (RF) output for unit j. The complete information about Random forest, introduced by Breiman (2001), is ANN is provided by Haykin (1994). a classification and regression process, comprises a gathering of regression trees trained using various P ¼ X � y (1) j ij bootstrap samples (bagging) of the training data. Each tree acts as a regression function on its own, and the Figure 3. Overview of the investigation. GEOLOGY, ECOLOGY, AND LANDSCAPES 113 final target is considered as the average of the indivi- Table 4. The details of the primary parameters. Machine learning dual tree outputs (Adusumilli et al., 2013).In the case approach Primary parameters of bagging, the training set contains about 67% of data RF k = 10, m = 1, Iterations = 100 from the actual training set; thus, about one-third of ANN(6-5-1) Neurons = 5, learning rate = 0.2, momentum = 0.1, I = 1500 the data are left out from each tree developed (Singh M5P m = 4 et al., 2017). These left-out data training data, termed as out-of-bag (out of the bootstrap sampling), were used to estimate prediction error and variable impor- n 2 tance. The quantity of trees to be grown (k) in the ðH FÞ i¼1 NSE ¼ 1 (7) forest and the number of features or variables selected n ðH HÞ i¼1 (m) at each node to generate a tree are the two stan- dard primary parameters necessary for random forest regression (Breiman, 2001). MSE ¼ ð ðH FÞ (8) i¼1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2.5. Empirical models X 1 n RMSE ¼ ð ðH FÞ (9) i¼1 Kostiakov and Multi Linear regression models are empirical models. The least-square techniques were where H is the actual values, F is the estimated/fore- used to drive regression coefficients of the Kostiakov casted values,H is the mean of actual values, and n is and MLR models with the training data set. the number of observations. 2.5.1. Multi-linear regression (MLR) 2.7. Implementation of machine learning The following relationship’s general form of the multi- methods linear regression model is considered to develop a nonlinear regression model. Four standard statistical measures: R, MSE, RMSE, b b b b b b 1 2 3 4 5 6 and NSE, were selected as performance evaluation FðtÞ ¼ at S C Si ρ MC (3) parameters judge the accuracy of the machine learning where FðtÞ is the dependent variable representing models and Kostiakov model. Several manual trials cumulative infiltration of soil; t,S, C, Si, ρ, andMC were carried out to discover the optimum value of are regarded as explanatory variables, a is the constant, the primary parameters. Higher values of R, NSE, and the estimate of parameters (regression and Lower values of MSE, RMSE indicate that the coefficients) b ,b , b ,b ,b and b is found by minimiz- 1 2 3 4 5 6 models’ better prediction accuracy. The number of ing the sum of squares of error in prediction based on trees to be grown (k) in the forest and the number of least squares. Based on above equation, the following features or variables selected (m) at every node to relationship is developed from the training data set: generate a tree are the two standard primary para- 0:6867 0:4358 0:3779 0:0865 1:458 0:4447 meters essential for random forest regression. In FðtÞ ¼ 1:5646t S C Si ρ MC M5P, calibration of models was done utilizing chan- (4) ging the value of no. of instances allowed at each node (m). In the ANN number of the hidden layer, the number of neurons, iterations, learning rate, and 2.5.2. Kostiakov model momentum are the primary parameters. The selected 0:6909 FðtÞ ¼ 1:57t (5) primary parameters of the modeling methods are pre- sented in Table 4. 2.6. Comparing parameters 3. Results and discussion Correlation coefficient (R), Nash-Sutcliffe model effi - Performance of Empirical models: Table 5 presents the ciency (NSE), mean square error (MSE) and root outcomes of estimated F(t) for the Kostiakov model mean square error (RMSE) and statistics parameters and MLR model based on R, MSE, RMSE, and NSE. were implemented to assess the precision of RF, M5P, Based on MSE and RMSE, MLR models show lesser ANN, MLR and Kostiakov model. The R, MSE, RMSE error values than those achieved from the Kostiakov and NSE are computed as: P P P model. R, MSE, RMSE, and NSE are the most common n HF ð HÞð FÞ assessment criteria for prediction or forecast models, R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P P P 2 2 2 2 showed that the MLR model is more accurate than the nð H Þ ð HÞ nð F Þ ð FÞ Kostiakov model. It is noticeable that MLR has higher (6) NSE and R than the Kostiakov model Since NSE = 0.75 114 B. SINGH ET AL. Table 5. The performance of ANN, M5P, RF, MLR, and Kostiakov models. Training data set Testing data set Approaches R MSE RMSE NSE R MSE RMSE NSE ANN 0.9926 0.5221 0.7226 0.9778 0.9911 0.5483 0.7405 0.9732 M5P 0.9445 2.7125 1.6470 0.8848 0.9282 2.9190 1.7085 0.8573 RF 0.9985 0.0874 0.2957 0.9963 0.9953 0.1969 0.4437 0.9904 MLR 0.9003 4.4840 2.1176 0.8095 0.8698 5.1131 2.2612 0.7500 Kostiakov Model 0.5844 15.5020 3.9373 0.3415 0.5757 13.7021 3.7016 0.3301 and 0.3301 & R = 0.8698, and 0.5757 were found for MSE, RMSE, and NSE indicate that RF is more accu- MLR and Kostiakov models respectively. The scatter rate than ANN and M5P models in predicting the plot among the actual and predicted F(t) value is cumulative infiltration of soil. represented in Figure 4 for the MLR and Kostiakov The Scatter plot among the actual and predicted model for the training and testing stages. Figure 4 F(t) values is represented in Figures 5, 6, and 7 for illustrates the closer comparison of F(t) between both M5P, ANN, and RF for both training and testing models, where it indicates that the MLR model’s per- stage correspondingly. The Figure shows that the formance is slightly better than Kostiakov models for data distribution around the agreement line shows every data. that the actual values are well correlated with predicted values of cumulative infiltration F(t). It is evident that RF is more correlated with actual 3.1. Performance of soft computing based models data than ANN and M5P since R = 0.9953, 0.9282, and 0.9911 were obtained for RF, M5P, and ANN The preparation of M5P, ANN, and RF-based models model with the testing dataset, respectively. is a trial and error process. Numbers of manual trials Overall, assessing Figures 5, 6, and 7 shows were carried out to discover the optimum value of a closer comparison of F(t) among RF, M5P, and primary parameters. The optimum values of user- ANN-based models, where it clearly shows a good defined parameters of M5P, ANN, and RF are listed performance of RF, M5P, and ANN for every data in Table 4. Outcomes of the accuracy of predicted F(t) point. Figures indicate that RF, M5P, and ANN for M5P, ANN, and RF-based on R, MSE, RMSE, and models are suitable for Predicting the values NSE are reported in Table 5. For the best model, of F(t). a Higher value of R and NSE and lower MSE and RMSE were considered. Cementing the outcomes, R, Figure 4. Scatter plot of empirical models for both training and testing stages. GEOLOGY, ECOLOGY, AND LANDSCAPES 115 Figure 5. The performance of the M5P model (training and testing stages). Figure 6. The performance of the ANN model (training and testing stages). 3.2. Comparison of empirical and soft computing empirical models, for both training and testing stages. based models A performance plot is also shown in Figure 8 using the testing data set for the Kostiakov model, MLR, M5P, The performance of predicted F(t) using RF, M5P, and ANN, and RF model. The comparison of statistical para- ANN compared with the predicted F(t) using empirical meters obtained from soft computing based models and models is assessed. The same data set was used to assess empirical models are listed in Table 5. Overall, RF shows the empirical model selected for RF, M5P, and ANN- a better prediction method, having a higher NSE of based models. Figure 8 shows the scatter plot and per- 0.9904. Predicted values of F(t) using RF (as represented formance pot using soft computing based models, 116 B. SINGH ET AL. Figure 7. The performance of the RF model (training and testing stages). in Figure 8) lies more closely to the perfect fit line and popular RF, M5P, and ANN, were utilized as soft follow the same path, which is followed by actual values computing models. The prediction of RF’s cumula- compared to those estimated using empirical models. tive infiltration values was more superior to those from ANN, M5P, MLR, and Kostiakov models. At the same time, the ANN model also showed 3.4. Sensitivity investigation improvement compared to M5P and empirical A sensitivity investigation was done to find the mainly models. Sensitivity results conclude that soil sam- effecting input factor in F(t) prediction. Table 5 sug- ples’ elapse time and moisture content are the most gests that RF is more suitable than other soft comput- significant factors when the RF-based modeling ing and empirical models, so the RF method was used. method is implemented for the prediction of cumu- Seven sets of training data were developed, one includ- lative infiltration of soil for a given dataset. This ing all input parameters and six were developed by investigation enhance the use and capabilities of eliminating one input factor at a time, and results were the artificial intelligence techniques and also train listed in terms of R as well as RMSE with testing data a general model to predict the infiltration process set. Outcomes from Table 6 conclude that elapse time for successfully implemented in the other study and moisture content were the most significant factors area. for predicting soil’s cumulative infiltration. Disclosure statement 4. Conclusion No potential conflict of interest was reported by the authors. Infiltration plays a crucial role in rainfall-runoff modeling, design, and scheduling of irrigation sys- tems, etc. The performance of soft computing- ORCID based models in predicting soil’s cumulative infil - tration over varying sand, silt, clay, density, and Balraj Singh http://orcid.org/0000-0002-0381-4363 Parveen Sihag http://orcid.org/0000-0002-7761-0603 moisture content was investigated. The three most GEOLOGY, ECOLOGY, AND LANDSCAPES 117 Figure 8. The performance of soft computing models and empirical models. Table 6. Sensitivity investigation using RF. Angelaki, A., Sakellariou-Makrantonaki, M., & Tzimopoulos, C. (2013). Theoretical and experimental RF Input factor research of cumulative infiltration. Transport in Porous Input arrangement eliminated R RMSE (mm) Media, 100(2), 247–257. https://doi.org/10.1007/s11242- t, S, C, Si, ρ, Mc 0.9953 0.4437 013-0214-2 S, C, Si, ρ, Mc t 0.7619 2.9312 t, C, Si, ρ, Mc S 0.9956 0.4323 Azamathulla, H. M., Haghiabi, A. H., & Parsaie, A. (2016). t, S, Si, ρ, Mc C 0.9952 0.4506 Prediction of side weir discharge coefficient by support t, S, C,ρ, Mc Si 0.9957 0.4259 vector machine technique. Water Science and Technology: t, S, C, Si, Mc ρ 0.995 0.4582 Water Supply, 16(4), 1002–1016, doi: 10.2166/ t, S, C, Si, ρ Mc 0.8063 2.758 ws.2016.014. Breiman, L. (2001). Random forests. Machine Learning, 45 (1), 5–32. https://doi.org/10.1023/A:1010933404324 Devices, D. (2014). Mini disk infiltrometer user’s manual. Decagon Devices. http://www.decagon.com/products/ hydrology/hydraulic-conductivity/mini-disk-portable- References tension-infiltrometer Adusumilli, S., Bhatt, D., Wang, H., Bhattacharya, P., & Ghorbani, M. A., Zadeh, H. A., Isazadeh, M., & Terzi, O. Devabhaktuni, V. (2013). A low-cost INS/GPS integra- (2016). A comparative study of artificial neural net- tion methodology based on random forest regression. work (MLP, RBF) and support vector machine models Expert Systems with Applications, 40(11), 4653–4659. for river flow prediction. Environmental Earth https://doi.org/10.1016/j.eswa.2013.02.002 Sciences, 75(6), 476. https://doi.org/10.1007/s12665- Al-Azawi, S. A. (1985). Experimental evaluation of infiltra - 015-5096-x tion models. Journal of Hydrology (New Zealand), 24(2), Haghiabi, A. H., Azamathulla, H. M., & Parsaie, A. (2017). 77–88. https://www.jstor.org/stable/43944562?seq=1. Prediction of head loss on cascade weir using ANN and 118 B. SINGH ET AL. SVM. ISH Journal of Hydraulic Engineering, 23(1), sodium adsorption ratio. Journal of AI and Data 102–110. https://doi.org/10.1080/09715010.2016.1241724 Mining, 6(1), 69–78. https://doi.org/10.22044/JADM. Haykin, S. (1994). Neural networks, a comprehensive foun- 2017.5540.1663. dation (No. BOOK). Macmilan. Sihag, P. (2018). Prediction of unsaturated hydraulic con- Mishra, S. K., Tyagi, J. V., & Singh, V. P. (2003). ductivity using fuzzy logic and artificial neural Comparison of infiltration models. Hydrological network. Modeling Earth Systems and Environment, 4 Processes, 17(13), 2629–2652. https://doi.org/10.1002/ (1), 189-198. https://doi.org/10.1007/s40808-018-0434-0 hyp.1257 Sihag, P., Singh, B., Sepah Vand, A., & Mehdipour, V. Pal, M., & Deswal, S. (2009). M5 model tree based modelling (2018a). Modeling the infiltration process with soft com- of reference evapotranspiration. Hydrological Processes: puting techniques. ISH Journal of Hydraulic Engineering, An International Journal, 23(10), 1437–1443. https://doi. 1–15. https://doi.org/10.1080/09715010.2018.1439776 org/10.1002/hyp.7266 Sihag, P., Tiwari, N. K., & Ranjan, S. (2017a). Prediction and Pal, M., Singh, N. K., & Tiwari, N. K. (2012). M5 model tree inter-comparison of infiltration models. Water Science, for pier scour prediction using field dataset. KSCE Journal 31(1), 34–43. https://doi.org/10.1016/j.wsj.2017.03.001 of Civil Engineering, 16(6), 1079–1084. https://doi.org/10. Sihag, P., Tiwari, N. K., & Ranjan, S. (2017b). Modelling of 1007/s12205-012-1472-1 infiltration of sandy soil using gaussian process regression. Pandey, P. K., & Pandey, V. (2019). Estimation of infiltra - Modeling Earth Systems and Environment, 3(3), 1091–1100. tion rate from readily available soil properties (RASPs) in https://doi.org/10.1007/s40808-017-0357-1 fallow cultivated land. Sustainable Water Resources Sihag, P., Tiwari, N. K., & Ranjan, S. (2017c). Prediction of Management, 5(2), 921-934.https://doi.org/10.1007/ unsaturated hydraulic conductivity using adaptive s40899-018-0268-y neuro-fuzzy inference system (ANFIS). ISH Journal of Parsaie, A., & Haghiabi, A. (2014). Predicting the side weir Hydraulic Engineering, 1–11. https://doi.org/10.1080/ discharge coefficient using the optimized neural network 09715010.2017.1381861 by genetic algorithm. Scientific Journal of Pure and Singh, B., Sihag, P., & Singh, K. (2017). Modelling of impact Applied Sciences, 3(3), 103–112. https://doi.org/10. of water quality on infiltration rate of soil by random 14196/sjpas.v3i3.1195. forest regression. Modeling Earth Systems and Parsaie, A., & Haghiabi, A. (2015). The effect of predicting Environment, 3(3), 999–1004. https://doi.org/10.1007/ discharge coefficient by neural network on increasing the s40808-017-0347-3 numerical modeling accuracy of flow over side weir. Singh, B., Sihag, P., & Singh, K. (2018). Comparison of Water Resources Management, 29(4), 973–985. https:// infiltration models in NIT Kurukshetra campus. Applied doi.org/10.1007/s11269-014-0827-4 Water Science, 8(2), 63. https://doi.org/10.1007/s13201- Parsaie, A., Haghiabi, A. H., Saneie, M., & Torabi, H. (2017). 018-0708-8 Predication of discharge coefficient of cylindrical Singh, V. P., & Yu, F. X. (1990). Derivation of infiltration weir-gate using adaptive neuro fuzzy inference systems equation using systems approach. Journal of Irrigation (ANFIS). Frontiers of Structural and Civil Engineering, 11 and Drainage Engineering, 116(6), 837–858. https://doi. (1), 111–122. https://doi.org/10.1007/s11709-016-0354-x org/10.1061/(ASCE)0733-9437(1990)116:6(837) Philips, J. R. (1957). The theory of infiltration: The infiltra - Tiwari, N. K., & Sihag, P. (2018). Prediction of oxygen tion equation and its solution. Soil Science, 83(5), transfer at modified Parshall flumes using regression 345–357. https://doi.org/10.1097/00010694-195705000- models. ISH Journal of Hydraulic Engineering, 1–12. 00002 https://doi.org/10.1080/09715010.2018.1473058 Quinlan, J. R. (1992) Learning with continuous classes. 5th Tiwari, N. K., Sihag, P., Kumar, S., & Ranjan, S. (2018). Australian joint conference on artificial intelligence (vol. Prediction of trapping efficiency of vortex tube ejector. 92, pp. 343–348). ISH Journal of Hydraulic Engineering, 1–9. https://doi. Richards, L. A. (1931). (1931) Capillary conduction of org/10.1080/09715010.2018.1441752 liquids through porous mediums. Physics, 1(5), Tiwari, N. K., Sihag, P., & Ranjan, S. (2017). Modeling of 318–333. https://doi.org/10.1063/1.1745010 infiltration of soil using adaptive neuro-fuzzy inference Sattari, M. T., Pal, M., Apaydin, H., & Ozturk, F. (2013). M5 system (ANFIS). Journal of Engineering & Technology model tree application in daily river flow forecasting Education, 11(1), 13–21. in Sohu Stream, Turkey. Water Resources, 40(3), Vand, A. S., Sihag, P., Singh, B., & Zand, M. (2018). 233–242. https://doi.org/10.1134/S0097807813030123 Comparative evaluation of infiltration models. KSCE Sattari, M. T., Pal, M., Mirabbasi, R., & Abraham, J. (2018). Journal of Civil Engineering, 22(10), 4173-4184. https:// Ensemble of M5 model tree based modelling of doi.org/10.1007/s12205-018-1347-1

Journal

Geology Ecology and LandscapesTaylor & Francis

Published: Apr 3, 2021

Keywords: Infiltration process; artificial intelligence techniques; kostiakov model; nash-sutcliff efficiency; multi-linear regression

References