Access the full text.
Sign up today, get DeepDyve free for 14 days.
Nutrient pollution such as nitrate (NO3-) can cause water quality degradation in rivers used as a source of drinking water. This situation raises the question of how the nutrients have moved depending on many factors such as land use and anthropogenic sources. Researchers developed several nutrient export coefficient models depending on the aforementioned factors. To this purpose, statistical data including a number of factors such as historical water quality and land use data for the Melen Watershed were used. Nitrate export coefficients are estimates of the total load or mass of nitrate (NO3-) exported from a watershed standardized to unit area and unit time (e.g. kg/km2/day). In this study, nitrate export coefficients for the Melen Watershed were determined using the model that covers the Frequentist and Bayesian approaches. River retention coefficient was determined and introduced into the model as an important variable. Introduction Protection of water sources needs prior research on the determination of possible sources of pollution. To this purpose, using the Frequentist and Bayesian approaches nitrate export coefficient modeling of the Melen Watershed is dealt with during this research. Modelling the export coefficients is a convenient way to analyze the effects of diffuse pollution in a research area. Moreover, refering to Fu (2012), retention of nutrients in the water body was also taken into consideration. Export coefficients are usually determined with help of load measurements at an outlet of a subwatershed where there is a single dominant land use (Brigault and Ruban 2000, Zobrist and Reichert,2006). In order to estimate the export coefficients it is assumed that the export coefficients for the same land use category are the same in all subwatersheds. Notable researchers regarding the export coefficients such as Rast and Lee (1983), Dillon and Kirchner (1975) used the same/similar approach. They observed concentrations of nutrients in streams (e.g. mg/L) then converted that value to load by multiplying with discharge (e.g. m3/s). Usually, they had daily flow rates and monthly or quarterly nutrient concentrations that are interpolated to account for days. De Klein and Koelmans (2011) calculated inputs to surface water and exports for 13 lowland river catchments in Western Europe, on a monthly basis. The catchments varied in size (21 to 486 km2), while annual in-stream retention ranged from 23 to 84% for N. A novel calculation method is presented that quantifies monthly exports from lowland rivers based on an annual load to the river system. The agreement between calculated values and calibration data was high (N: r2 = 0.93; p < 0.001). Validation of the model also showed good results with model efficiencies for the separate catchments ranging from 31 to 95% (average 76%). This indicates that exports of nitrogen on a monthly basis can be calculated with few input data for a range of West European lowland rivers (De Klein et al. 2011). Wickham et al. (2005) found that land cover is the main driver of nutrient export, and regional variation is insignificant. Wickham et al. (2008) noticed that the variances of N and P concentrations among different land uses within ecological regions were respectively six and three times greater than the variance among different ecological regions. Though, an effect of different ecological regions is less significant compared to different land use compositions. Studies on distinct geographical regions definitely lead to different results. Most of the studies that deal with nutrient export were conducted in the USA. Current research could extend it to Europe. There are also some analytical models to estimate the pollutant removal efficiency for surface waters. Yang et al. (2014) successfully applied a screening-level modeling approach to estimate nitrogen loading in Tippecanoe River watershed. Screening-level is not statistical but an advanced analytical model consists of both hydrological and water quality parameters. Although screening-level model is a very comprehensive approach, it is not very favorable if there is no wide range of spatial and temporal observed data for a variety of hydrological and water quality parameters. However statistical methods may provide reliable results closest to analytical ones without needing complicated data. A typical example of this situation is the research of Zobrist and Reichert (2006) in Swiss Watersheds, where they applied the Bayesian approaches successfully to estimate nitrate export coefficients. The Frequentist and the Bayesian techniques were applied in order to estimate export coefficients for the Melen Watershed in Turkey. Instead of calculating the contributions of subwatersheds individually, whole watershed was considered for the estimation of total load at the outlet of the Melen Watershed using the calculated nutrient export coefficients. The frequentist approach has the goal of extracting information from data only, without relying on the prior knowledge (Ramirez and Sanz, 2013). In contrast to the Frequentist approach, the Bayesian approach has the goal of combining prior knowledge with data to optimally use both sources of information. Success of the Bayesian approach is directly proportional to the sufficiency of data for acquiring the prior information about estimands (Essahale et al. 2010, Lee 2012). The Bayesian analysis was conducted using the IBM SPSS AMOS software and posterior information about land use based export coefficients was obtained through the MCMC (Markov Chain Monte Carlo) method. Estimated land use based nutrient export coefficients are in kg/km2/day unit. In addition, monthly river retention values of nitrate in all subwatersheds of the Melen Watershed were estimated. This information was used in order to predict nitrate export coefficients appropriately. The results show that the frequentist approach gives closer estimates to the observed values compared to the Bayesian approach. According to the Ministry of Forestry and the Water Works of Turkey, regarding the pollution status, the Melen River Basin should be dealt with primarily. The Melen River Basin is currently under the threat of land based pollution. Sumer et al. (2001) revealed in their research that its water can be classified as water class number 2 out of 4. Since 2001, settlements and the population in the watershed have increased. As far as it is known there are no agricultural or urban best management practices applied in the region. Therefore, a significant decrease in the water quality of the river in the future is expected. Two main rivers are located in the Melen Waterhed. These are the Buyuk Melen and the Kucuk Melen rivers (Figure 1). The government constructed a water regulator close to the outlet of the Buyuk Melen River. Fresh water is pumped to Istanbul with a 150 km long pipe. Protection of water quality in the Melen Watershed is also vital for Istanbul's drinking water quality. In Figure 1, red bullets indicate sampling sites for data gathered from the State Hydraulic Works of Turkey (DSI, 2011). This data covers crucial information about the historical trend of the pollution in the Melen Watershed. Also, pink bullets indicate sampling sites for data measured by stanbul Technical University (Ozturk et al. 2008) from different locations in the Melen Watershed, including headwater subwatersheds. Materials and Methods The simplest export coefficient model assumes that average diffuse loads can be estimated by a sum of export coefficient terms regarding the different specific land use types (Zobrist and Reichert 2006). Measurements in Melen Watershed were conducted weekly, sometimes daily as long as the weather conditions were suitable, or twice a month. Hence monthly average values were calculated using the mean value of these measurements. These values are the "monthly average daily values". Temporally averaged (monthly average daily values) loads from a number of subwatersheds can be quantified using Equation (1): Study Area The Melen Watershed is located in Western Black Sea region of Turkey (Figure 1). It has the 2437 km2 area (Ozturk et al. 2008). The Melen Watershed is bounded by the Bolu Mountains to the east, the Sakarya Province to the west, the Orhan Mountains to the north, and the Abant Mountains to the south. The Melen Watershed provides fresh drinking water to most of Istanbul. Fig. 1. The Melen Watershed, its rivers, and coordinates of sampling points in WGS84 Datum UTM coordinate system 36N m j =1 n Li , j = (E i, j Ai , j ) (1 - R j ) (1) Where j is the index labelling the subwatershed, i is the index labelling the land use types (10.66% meadows and pastures, 0.66% lakes and rivers, 36.02% agricultural area, 51.25% forest, and 1.46% urban area), Li,j is the average load at the outlet of subwatershed j as predicted by the model, Ei,j is the nutrient export coefficient of i land use type, Ai,j is the area of the i land use in jth subwatershed (Figure 2), R is the percent river nutrient retention coefficient. The watershed is delineated into discrete subwatersheds for enabling the modeling to represent the spatial heterogeneity in the catchment. The delineation of the Melen Watershed was carried out based on a Digital Elevation Model (DEM) created in 10 m. × 10 m. resolution by both digitizing topographical map sheets and modifiying the available vector maps. Created DEM was imported to Arcview grid format with proper projection (UTM Zone 36 N WGS84 Datum). Each subwatershed has a contribution on the total load at the outlet of the Melen Watershed. In order to quantify their contribution, flow path of the whole watershed has to be specified. Figure 3 shows the flow path or the direction of the flow at the watershed. The retention and loss of nutrients in river systems can have significant detrimental consequences on downstream water quality (Donohue et al. 2005, Hogan et al. 2012, Vsetickova et al. 2012, Izagirre et al. 2013). Peterson et al. (2001) examined the nitrogen removal efficiencies of headwater streams from all over the United States. They found that the smaller the stream (lower the order), the higher the efficiency for removal of nitrogen (N) is. This is because the water is in greater contact with various biofilm surfaces in smaller streams. On Fig. 2. Land use map for the Melen Watershed Fig. 3. Subwatersheds, rivers, and flow path of the Melen Watershed average, dissolved inorganic nitrogen (both NH4+ and NO3-) is removed at a rate of 64% per kilometer of a stream. The small size of the stream ensures a large amount of water-sediment contact, which removes nitrogen from runoff via nitrification and denitrification by bacteria in the sediments (Peterson et al. 2001). According to De Klein and Koelmans (2011), monthly retention of nitrogen is estimated from surface water area specific runoff (Equation (2)). Nitrate retention was calculated using the Equation (2) assuming that percent retention of NO3is almost equal to percent retention of total N in the rivers of Melen Watershed. ^2 as introduced in Equation (1), x is the estimate of the variance 2 of the additive stochastic error term X , fx is the likelihood function of the model, and x values are the measured loads at the watershed outlet. The frequentist approach requires a well-defined maximum of the likelihood function in order to provide unique results (Cowan 2012). For the normal distribution; N(,2) or N(E,2) has probability density function and the likelihood as seen in Equations (5)-(6) respectively. In Equation (6) x is the sample mean. f ( x | E, 2 ) = Ri = 0.0246 Qi SW -0.57 (x - L(E )) 1 2 2 2 (5) (2) Where Qi is the average (monthly) discharge (m /s), SW is the total area of surface water in the catchment (ha), Ri is the retention fraction (-), i is the index for month (-). Statistical methods are used to do predictions that are as close as possible to the observed values. One of the most preferred methods of parameter estimation for distribution fitting is the Maximum Likelihood Method (Law and Kelton 1991). Maximum Likelihood Estimation (MLE) seeks the parameter values that are most likely to produce the observed distribution (Gardner 2012, Meer et al. 2013, Nichols et al. 2013). The basic goal of using MLE is to determine the parameters that maximize the probability or likelihood of the sample data. MLE consists of three steps. These are: specifying the likelihood function, taking derivatives of likelihood with respect to the parameters, setting the derivatives equal to zero and solving for the parameters. When assuming normally and independently distributed stochastic errors that add to the results of the deterministic function given by the Equation (1), the likelihood functions of the export coefficient model for loads become (Equation (3)): f ( x1 ,..., xn | E , 2 ) = f ( xi | E , 2 ) = n (6) This distribution has two parameters (E, 2), so we can maximize the likelihood over both parameters, L(E, ) = f ( x1 ,..., xn | E, ) . Logarithm is continuously increasing function over the range of likelihood. The values which maximize the likelihood will also maximize its logarithm (Cowan 2012). Maximizing logarithm requires less algebra. Log likelihood is differentiated with respect to E and equated to zero as follows (Equation (7)). Similarly we differentiate the log likelihood with respect to and equate to zero (Equation (8)). 2 f ( x | E, x ) = xn 2 n j =1 (x - L (E )) 2 x (3) ln E (7) Where x = (x1, ... , xn) is the vector of loads of the watershed, x is the standard deviation of the normal distribution of loads or concentrations around the deterministic model results, and Lj is the average load at the outlet of the watershed according to the Equation (1). In the frequentist approach, the parameter estimates are determined by maximization of the likelihood function shown in the Equation (3) into which measurements are substituted for the argument describing the outcomes (Equation (4)): 1 ln E 2 2 + - (xi - x ) + n( x - L (E )) =0 ln ^ (E,^ ) = argmax f (x E, ) 2 x 2 E , X 2 X (4) 1 ln = 2 2 n 2 + - (8) Where argmax stands for the argument of the maximum, that is to say, the set of points of the given argument for which the value of the given expression attains its maximum value. In ^ this equation, E are the estimates of the export coefficients E =- =0 Approach, this situation was also taken into consideration. Setting the derivative of the likelihood function with respect to a single term to zero and solving for the unknown term leads to four equations with the aforementioned four unknown parameters. In this study, solutions for four linearly independent equations for four unknowns were obtained using the Direct Search optimization package of the Maple 15 Pro. Maximum likelihood estimator calculated from the above two derivatives for =(E,2) is symbolized as seen in Equation (9): ^ ^ ^ = E , 2 (9) In contrast to the Frequentist approach, the Bayesian approach has the goal of combining prior knowledge with data to use both sources of information optimally (O'Reilly et al., 2012). Prior knowledge on parameter values has to be 2 formulated as a prior probability density, f prior E, x , and is then updated to the posterior density by applying Equation (10). The constant of proportionality is calculated by normalization of the posterior density. This technique has the advantage of still being applicable if the parameters are not identifiable from data. In the case of poor identifiability, the posterior distribution is not much different from the prior (Gelman 2006, Morris et al. 2012). In the case of high information content of data, it is typically much narrower. The disadvantage of this technique is that use of prior information introduces a subjective element into data evaluation procedure. Results and Discussion According to the Kolmogorov-Smirnov's goodness of fit test and the consideration above, the frequency distribution of the nitrate load data at the outlet of the Melen Watershed is defined as the Inverse Gaussian. The Inverse-Gaussian distribution of xi is described by two characteristics, a mean > 0 and precision > 0 . See Equation (12) for the probability density function. f (x ; , ) = 2x 3 1/ 2 exp - (x - ) , 0 < x < (12) 2 2 x 2 2 2 f post E , x x f x x E , X f prior E , x (10) In the Frequentist inference, any given experiment is considered as one of an infinite sequence of possible repetitions of the same experiment with statistically independent results (Everitt, 2006). The independent and identically distributed observations (x1,..., xn) come from the sampling distribution f(X | ) where is the fixed parameter value. The MCMC is a simulation technique that computes posterior values of interest by sampling from posterior distributions (Huang and Yu 2010, Konomi et al. 2013). The Bayesian posterior distribution is obtained by the MCMC method. This is beneficial for multi-parameter models where it is hard to have algebraic solution. The MCMC algorithms are computational tools that allow for the generation of random numbers from the posterior distribution * ( X ) using the numerator of the expression in Equation (11) without calculating the integral in the denominator (Lele et al. 2007). * ( X ) = f ( x1 , x 2 , ..., x n | ) ( )d f ( x1 , x 2 , ..., x n | ) ( ) (11) Where f (x1, x2,..., xn| ) is the likelihood, * ( X ) is the posterior, and () is the prior distribution. Parameter estimation depends on the frequency distribution of data. The main goal is to find which distribution fits data better. For this aim, the Kolmogorov-Smirnov's goodness of fit test was performed, since it is applicable even for a sample consisting of a small number of observations. According to the results of this test, best-fitting distributions were identified. Additionally, there should be a single term (e.g. mean or ) that consists of all desired parameters (Forest EFor, agricultural EAgr, meadows EMea and residential area (ERes) export coefficients) to take its derivative. Hence, it is not necessary to deal with a distribution that has complex multiplicative terms. While selecting the appropriate distribution for the Frequentist Where > 0 is the mean. > 0 is the shape parameter, variance is given by 3/ . Maximum likelihood is the Frequentist but it is also a part of the Bayesian inference. Table 1 shows estimations for NO3- export coefficients (kg/km2/day) using the Frequentist approach as the mean monthly average daily value for each year. By applying the procedure shown in Equations (5)(9), considering that the distribution is the inverse Gaussian, unknown export coefficient parameters can be calculated. For each year between 1995 and 2006, export coefficients for different land use types were calculated. Estimated nutrient export coefficients are in kg/km2/day unit. In other words, every day, export coefficient times kg of nitrate load is emitted per km2. Retention in the water body was taken into account for the estimation of export coefficients. Briefly, values in Table 1 multiplied by the corresponding land use area and (1 retention coefficient) give us the daily nitrate load estimation in kg per km2. Overall estimation for nitrate load was calculated using mean value of the predicted export coefficient values (Mea=0.759; Agr=2.749; For=0.606; Res=1.678). The key issue that differs the Bayesian estimation from the Frequentist approach is to use the prior information about estimands. Headwater subwatersheds are not affected by other subwatersheds. Headwater subwatersheds of the Melen Watershed are subwatersheds 2, 5, 6, 8, 9 and 10 (see Figure 3). It is necessary to use the observed data from headwater subwatersheds in order to define prior distributions of the land use based export coefficients. This prior information helps us to see what is the distribution of export coefficient frequencies, what is their mean, standard deviation, etc. Sometimes use of high level prior information is crucial. To this purpose, the usual method of getting this prior information is to have sampling stations in such an area where a single land use is dominated. More precisely, if it is required to observe a prior distribution for agricultural area nutrient export coefficient (Agr or EAgr), we need to sample in an area that is agriculturally dominated. First of all, the observed data from headwater subwatershed 6 were analyzed since forest area is dominated (91.15%) in this subwatershed. After getting information for forest area nutrient export coefficient (EFor), data from headwater subwatersheds Table 1. Estimated NO3- export coefficients (kg/km2/day) using frequentist approach Whole watershed NO3- Export coefficient (kg/km2/day). Mean "monthly average daily" values for each year Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Meadows Brush and Pasture 0.595 0.508 0.708 0.710 0.608 1.220 0.705 0.826 0.804 1.101 0.603 0.726 Agricultural 2.687 2.084 3.544 3.140 2.033 3.331 2.117 3.077 3.000 3.125 2.182 2.673 Forest 0.505 0.405 0.612 0.554 0.400 0.950 0.605 0.700 0.615 0.815 0.505 0.600 Residential 1.901 1.206 1.995 1.926 1.257 1.968 1.106 1.966 1.903 1.844 1.410 1.653 10, 2 and 8 were consecutively analyzed in order to specify agricultural (EAgr), meadows (EMea) and residential area (ERes) nutrient export coefficients, respectively. Please notice that the observed data available from these subwatersheds were in a sufficient amount only for nitrate parameter. Thus prior distributions were created for the nitrate export coefficients of each type of land use (Mea, Agr, For, Res). Then the Bayesian estimation was able to start. Using the IBM SPSS AMOS software (Arbuckle 2009), the Bayesian analysis was conducted and posterior information about land use based nitrate export coefficients was obtained using the MCMC method. Single value for each land use based nitrate export coefficients was estimated using whole monthly data from January 1995 to December 2006. In the AMOS, parameter configuration for the analysis is important (Chenini and Khemiri 2009, Loehlin 2013). During the Bayesian analysis, nitrate loads exported from each type of land use (Mea, Agr, For, Res) were selected as independent variables and total nitrate exported (NO3-) at the outlet in kg/km2/day was selected as a dependent variable. Dagum, Gamma, Kumaraswamy and Wakeby distributions were encountered during the Bayesian analysis phase of this study. Prior information is not always very informative. It is a must to have good agreement between the Frequentist approach and the Bayesian approach with a non-informative prior. Figure 4 shows that the assigned priors are highly informative for the Bayesian estimation. The Bayesian approach does not give close estimations to the Frequentist MLE solution. Posterior distribution is significantly different than the prior and the likelihood. The Bayesian approach gives different estimates for land use based nitrate export coefficients. On the other hand, predicted yearly average nitrate loads (kg/day) using either the Bayesian approach or the Frequentist approach have determination coefficient (R2) values close to each other (R2=0.75 for the frequentist approach and R2=0.74 for the Bayesian approach). Frequentist approach gives closer estimates to the observed values with respect to the Bayesian approach (Figure 4). A sample application of both the Frequentist and the Bayesian approaches for land use based nitrate export coefficients was shown in detail. Predicted nitrate export coefficients were tabulated in Table 2. Conclusion The primary objective of this research is to create a unique nutrient export coefficient model for the Melen River Basin, which has the wide range of land use characteristics. While doing this, retention coefficient and also the effect of the draining upper subwatershed were considered. Two different but related techniques were used for the modeling of nutrient export coefficients. These are so called the Frequentist approach or the maximum likelihood estimation and the Bayesian estimation using the MCMC algorithm. For the latter technique, the AMOS software was used. The Bayesian estimation differs from frequentist approach since it uses the prior information about estimands. Based on the results, we can conclude that the Frequentist approach gives better estimations with respect to the Bayesian approach. Reliability of the results depends on the quality of the data used. Field works, especially sampling in dominated land use areas, helps to specify more reliable prior distribution of each land use based nutrient export coefficients in order to get more precise estimations, particularly through the Bayesian approach. The Frequentist approach gives convincing results. Results from the Bayesian approach would have been better if there could be a sufficiently large temporal data for headwater subwatersheds. Further studies, which take this issue into account, will need to be undertaken. This study has important findings for developing export coefficient models and it is intended to guide researchers on the subject. Fig. 4. Observed nitrate loads (kg/day), Frequentist and Bayesian estimations for each year between 1995 and 2006 Table 2. Estimated nutrient export coefficients (kg/km2/day) using Frequentist and Bayesian approaches Nitrate (NO3-) export coefficients (kg/km2/day) Meadows Brush and Pasture The Frequentist Approach The Bayesian Approach 0.759 1.611 Agricultural 2.749 3.832 Forest 0.606 1.288 Residential 1.678 2.462
Archives of Environmental Protection – de Gruyter
Published: Jun 1, 2016
Access the full text.
Sign up today, get DeepDyve free for 14 days.