Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization

A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization Hindawi Journal of Advanced Transportation Volume 2023, Article ID 5070504, 15 pages https://doi.org/10.1155/2023/5070504 Research Article A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization 1,2 1 1,3 1 Chunguang He , Dianhai Wang , Yi Yu , and Zhengyi Cai College of Civil Engineering and Architecture, Zhejiang University, Hangzhou, China School of Transportation and Logistics Engineering, Xinjiang Agricultural University, Urumqi, China Shanghai AI Laboratory, Shanghai 200232, China Correspondence should be addressed to Zhengyi Cai; caizhengyi@zju.edu.cn Received 30 March 2022; Revised 7 September 2022; Accepted 24 November 2022; Published 7 February 2023 Academic Editor: Young-Jae Lee Copyright © 2023 Chunguang He et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Te link dynamic vehicle count is a spatial variable that measures the trafc state of road sections, which refects the actual trafc demand. Tis paper presents a hybrid deep learning method that combines the gated recurrent unit (GRU) neural network model with automatic hyperparameter tuning based on Bayesian optimization (BO) and the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) model. Tere are four steps in this hybrid approach. First, the ICE- EMDAN is employed to decompose the link dynamic vehicle count time series data into several intrinsic components. Second, the components are predicted by the GRU model. At the same time, the Bayesian optimization method is utilized to automatically optimize the hyperparameters of the GRU model. Finally, the predicted subcomponents are reconstructed to obtain the fnal prediction results. Te proposed hybrid deep learning method is tested on two roads of Hangzhou, China. Results show that, compared with the 12 benchmark models, the proposed hybrid deep learning model achieves the best performance in link dynamic vehicle count forecasting. on a specifc road that describes the space occupancy rate of 1. Introduction roads [4–6] and can refect trafc demand more precisely With the development of the social economy and urbani- [7, 8]. Accurate and real-time prediction of the link dynamic zation, travellers’ trafc demands increased rapidly. Trafc vehicle count can also provide a reliable basis for the net- problems such as trafc congestion, environmental pollu- work-wide trafc signal control strategy and optimization tion, and economic losses have brought challenges to urban [1, 2, 9]. In recent years, thanks to continuous investments in transportation management. Intelligent transportation intelligent transportation systems, various sensors have been technology promises to deal with these problems, and ac- deployed and a large amount of real-time trafc data can be curate and efective forecasting of trafc demands is a key collected. Te link dynamic vehicle count data are efectively calculated under the new data environment with mature step. Traditionally, indicators such as trafc fow are used to represent trafc demands. While in seriously congested technology [10, 11]. However, the critical issues and chal- areas, those indicators can hardly refect actual trafc de- lenges unaddressed in forecasting link dynamic vehicle mands [1]. Trafc signal control systems, such as SCOOT count in the following aspects: (a) LDVC is often disturbed (Split Cycle Ofset Optimization Technique) and SCATS by stochastic factors. For instance, LDVC experiences a (Sydney Coordinated Adaptive Trafc System), failed to sudden increase and decrease when trafc fow becomes work properly due to the inaccurate trafc demand esti- congested or trafc incidents occur, such as accidents and mation in the congested period [2, 3]. temporary trafc control measures. (b) Although selecting Compared with the indicator of trafc fow, the link the “best” model among a set of baselines is signifcant, a dynamic vehicle count (LDVC) refers to the vehicle number better alternative is to consider the strength and robustness 2 Journal of Advanced Transportation consuming and laborious in hyperparameter tuning, the of the prediction results. By decomposing trafc data into subsequences, it can help capture both the common ten- hyperparameters in our model are automatically and ef- ciently tuned using BO. (4) Te test results show that the dency and some changes in trafc fow to improve the prediction accuracy. (c) Most machine learning-based proposed hybrid deep learning framework achieved the best methods can mine the nonlinear profle of link vehicle count, performance in the aspect of improvement in prediction but overftting issues may occur. It indicates that the model performance and reduction in training time compared with extracts noise in the training data as a feature of the data a variety of benchmark models. Considering the LDVC is itself, which in turn degrades performance in the test dataset. one of the best inputs for real-time control applications in Machine learning approaches are widely used in trafc urban areas, the proposed model could provide the accurate real-time LDVC forecast data for real-time trafc control. forecasting. Researchers obtained some progress in pre- diction algorithms, model fusion, and temporal and spatial Te rest of the paper is organized as follows: Section 2 reviews the relevant literature about trafc prediction model, characteristics of trafc data. However, there are still some crucial issues and challenges. (a) In trafc research feld, hybrid model, and hyperparameter optimization. Section 3 proposes the framework of the prediction model, including most forecasting models focus on trafc fow [12–15], trafc speed [16–18], or travel time [19–21]. Few forecasting the logical relationship between data decomposition, model models focus on the prediction of the LDVC, let alone training, parameter optimization, and prediction compo- considering the data characteristics of the LDVC in the nent reconstruction. Te fundamental methods are prediction model. (b) Te LDVC has nonlinear and sto- explained, including the sequence data decomposition chastic characteristics, such as the random change due to method ICEEMDAN, the basic model GRU, and the trafc congestion and weather factors. Simultaneously, due hyperparameter tuning algorithm BO. Section 4 validates the proposed model using actual link dynamic vehicle count to the signal light control, weekday commuting, the LDVC shows the characteristics of long short-term periodic data collected in Hangzhou, China. We further study the infuence of hyperparameter tuning and data decomposition changes in signal cycle, day, and week pattern. (c) Te excellent performance of deep learning methods in trafc on model performance. Besides, we compare the prediction accuracy and computing time of our model with 12 forecasting is inseparable from the model hyperparameter tuning efciently and appropriately. Te prediction model benchmark models. Section 5 concludes the paper. represented by deep learning requires many hyperparameter tuning, which is time consuming and laborious but hard to 2. Literature Review obtain suitable hyperparameters. In response to these challenges, this paper proposes a In the past few decades, there has been a lot of research on hybrid ICEEMDAN-GRU-BO forecasting model for the link trafc system forecasting. Tis paper focus on the trafc dynamic vehicle count. Te hybrid model fuses the ICE- prediction model, hybrid model, and hyperparameter op- EMDAN based on hyperparameter tuning with Bayesian timization. Te literature review of the three research lines is optimization (BO) and the improved complete ensemble summarized as follows. empirical mode decomposition with adaptive noise (ICE- Tis section frst reviews the literature on trafc infor- EMDAN). First, the ICEEMDAN is utilized to decompose mation prediction models, such as parametric models the link dynamic vehicle count data into subcomponents. [12, 13, 22] and nonparametric models [20, 23–28]. Tus, the decomposed components reduce the stochastic A parametric model mainly considers some unsteady characteristics and become more regular and suitable for time series data to establish prediction models with limited prediction. Second, the GRU models are applied to predict parameters. Autoregressive integrated moving average those components by considering the feature of long short- (ARIMA) or seasonal autoregressive integrated moving term periods. Simultaneously, the BO is used to automati- average (SARIMA) is commonly used in time series data cally optimize the hyperparameters of the GRU models to forecasting in transportation. For example, Williams et al. deal with the challenge problem of the hyperparameter developed a SARIMA model to identify seasonal patterns to tuning. Finally, the predicted subcomponents are recon- capture periodic changes in trafc states [13]. Van Der Voort structed to obtain the fnal prediction results. et al. used a self-organizing neural network graph as the Te contributions of this paper can be summarized as initial classifer associated with an individually ARIMA follows: (1) For the frst time, a hybrid ensemble decom- model to predict the half hour trafc fow on French position deep learning prediction framework is proposed to highways [12]. Kumar et al. selected a three-lane arterial predict the link dynamic vehicle count to improve the road with limited three days of trafc fow data in Chennai, prediction accuracy and reduce the prediction time. (2) India, to establish a SARIMA model for trafc fow fore- Aiming at tackling the challenges mentioned above, we casting [14]. present a novel approach by integrating the GRU with Te nonparametric models mainly include machine ICEEMDAN. Te data decomposition method ICEEMDAN learning and deep learning approaches. Machine learning is used to reveal the nonlinearity and stochastic character- has been widely used to predict trafc information. For istic of the link dynamic vehicle count. We propose a short- example, researchers used the k-nearest neighbor (KNN) term prediction method based on GRU model to efectively model to predict short-term trafc fow [23, 26, 29]. Support capture the long short-term period features. (3) To avoid vector regression (SVR) was utilized in trafc fow predic- overftting issues and address the considerable time tion [27, 29, 30] and travel time forecasting [20]. Random Journal of Advanced Transportation 3 and Chen combined the EMD and back propagation neural forest regression (RF) was applied for trafc fow prediction [31]. Early neural network modeling such as multilayer network (BPN) model to predict short-term passenger fow in the subway system [33]. Yang and Chen combined EMD perceptron (MLP) [24, 32], back propagation neural net- work (BPN) [33], and artifcial neutral networks (ANN) and stacked autoencoder (SAE) for passenger fow predic- were widely employed in the prediction of trafc systems. tion in urban rail transit [42]. Jiang et al. combined the For example, Kumar et al. operated ANN for the short-term EEMD and grey support vector machine model to develop a prediction of trafc volume [14]. Ruiz Aguilar et al. proposed hybrid short-term demand forecasting method for short- a hybrid prediction method based on the combination of the term high-speed rail passenger fow forecasting [43]. Te ARIMA and ANN models to predict the number of goods particle swarm optimization algorithm was used to optimize the grey support vector machine, and the results show that inspected at European border checkpoints [34]. However, the traditional artifcial neural network cannot the model performs well in terms of prediction accuracy. Zhang et al. proposed a hybrid deep learning prediction capture time-series data features because it does not consider time dependence. To overcome this shortcoming, re- model that combined 3D convolutional neural network (3D CNN) and EEMD to predict the network-wide speed of searchers have explored a large number of novel neural network models. Te deep learning models are the fastest Beijing, and the results showed that the EEMD method growing algorithms in recent years. In terms of sequence efectively improves the input data, and 3D CNN can data modeling, RNN (Recurrent Neural Network) is one of consider the temporal and spatial characteristics of the road the representatives. Van Lint et al. proposed a nonlinear network [40]. As advantages in data decomposition, other state space method using RNN to predict short-term improved algorithms were proposed based on EMD and highway travel time [19]. EEMD, such as complementary EEMD (CEEMD) [44], a complete EEMD with adaptive noise (CEEMDAN) [45], and RNN’s variant LSTM (Long-Short Term Memory) solves the shortcoming that RNN cannot store long-term memory of ICEEMDAN [46]. Te excellent performance of the neural network pre- information. LSTM has successful applications in trafc system prediction. For example, Ma et al. wielded remote microwave diction model is inseparable from the parameter optimi- zation. Te following literature reviews focus on the relevant sensor data to establish LSTM models to predict trafc speed [35]. Zhao et al. proposed an LSTM prediction model for short- parameter optimization methods. Suitable parameter setting term trafc fow prediction [36]. Yang et al. applied LSTM to shows an enormous impact on the performance of the neural predict urban rail transit passenger fow [37]. network prediction model [43]. For parameter optimization As an improved algorithm of LSTM, GRU [38] was frst in machine learning and deep learning models, manual proposed by Cho et al. in 2014. In most cases, the prediction tuning relies on experience and vulnerable to bias, and the performance of GRU is similar to LSTM, but the training tuning process is very time-consuming [47, 48]. Commonly used automatic parameter tuning algorithms [49] include time is reduced. Zhang et al. predict network-wide trafc speed with a deep learning model, and the results show that grid search, random search, and Bayesian optimization. Grid and random search have shortcomings, in which the new GRU obtains even better performance than LSTM [39]. Te hybrid model that combines data decomposition search may separate from the previous search information and machine learning or deep learning approach can ef- and cannot make full use of prior knowledge. Bayesian ciently improve the prediction performance. Te following optimization utilizes the prior distribution information of part reviews the literature on hybrid models that combine parameters [50]. It can auto efectively search for hyper- the data decomposition methods, machine learning, and parameters with fewer iteration steps. Bayesian optimization deep learning methods. Both machine learning and deep has become the most practical tool for parameter optimi- learning models require stable inputs, and data decompo- zation in predictive systems, which is successfully applied in sition methods can efectively improve the quality of model the deep learning model hyperparameter tuning recently, such as references [51–53]. input data and make the decomposed data more regular [40]. Choosing a reliable data sequence decomposition In response to these challenges, we propose a hybrid deep learning model for link dynamic vehicle count fore- method is critical for the stable and efective input required in the forecasting model. Traditional data decomposition casting. Te data decomposition method ICEEMDAN is methods, such as wavelet transform (WT), have been suc- adopted to decompose the irregular trafc demand data to cessfully applied in transportation. For example, Wang and more simple IMFs components. Te GRU model is used to Shi established a short-term trafc speed prediction model predict the IMFs’ components considering long- and short- based on chaotic wavelet transform and support vector term periodic characteristics of trafc demand, and Bayesian machine [16]. However, the traditional wavelet transforms optimization is utilized for automatically tuning multiple hyperparameters of the deep learning models. and Fourier transform techniques have disadvantages. For example, it is difcult to choose the mother wavelet, while empirical mode decomposition (EMD) and ensemble em- 3. Link Dynamic Vehicle Count pirical mode decomposition (EEMD) are more efective. Te Forecasting Model decomposition method can decompose the data into in- trinsic mode components. Many researchers have greatly Tis paper proposes a hybrid deep learning model that improved the accuracy of the prediction model based on the combines ICEEMDAN and GRU with Bayesian optimiza- decomposition of EMD and EEMD [41]. For example, Wei tion for link dynamic vehicle count forecasting, called 4 Journal of Advanced Transportation ICEEMDAN-GRU-BO. Figure 1 shows the framework and IMFs. Mode mixing reduces the EMD’s ability to recognize the main steps are as follows: diferent amplitudes in the actual data of the IMF compo- nents and afects the prediction accuracy of the hybrid model (1) Data processing, including data cleaning, normali- [43]. To overcome the problem of mode mixing, researchers zation, and completion. For example, we fll in the have proposed a new data decomposition method that adds missing data according to the average value of the Gaussian white noise, called “Ensemble Empirical Mode previous three steps. Decomposition” (EEMD) [55–58]. EEMD is a noise-assisted (2) Data decomposition. ICEEMDAN is adopted to data analysis method that aims to overcome the short- decompose the link dynamic vehicle count into comings of the EMD method. Te steps of EEMD are as several intrinsic mode functions (IMFs) and a re- follows: sidual. Tese mode components are simpler and Step (1): Before EMD decomposition, Gaussian white more regular, which can improve the accuracy of the noise is added to the original sequence data each time, deep learning model. and the construction sequence after addition is as (3) Subcomponents prediction and hyperparameters follows: optimization. GRU is used to predict subcompo- i i x (t) � x(t) + w (t), (3) nents of diferent frequencies as the basic prediction model. Tis framework employs a Bayesian opti- where x (t) is the construction sequence data, x(t) is mization algorithm to optimize the hyperparameters i 2 the original sequence data, and w (t) ∼ N(0, σ ) is the of each GRU model. Tese hyperparameters include added white noise sequence data. the initial learning rate, number of hidden units, L2 Step (2): EMD is adopted to decompose the con- regularization coefcient, and number of GRU struction sequence x (t) into n IMFs. layers. (4) Mode reconstruction and results evaluation. Te i i i x (t) � 􏽘 c (t) + r (t), (4) j n fnal prediction result can be obtained by summing j�1 up the predicted subcomponents and evaluated by i i the test dataset. where c (t) is the ith decompose of jth IMF and r (t) is j n the ith residual data. Te following describes the model details. Step (3): Repeat steps (1) and (2) M times, and add diferent white noise each time to obtain M groups of 3.1. Data Normalization. Te data normalization is corresponding IMFs. adopted in data processing to reduce data redundancy and Step (4): Calculate the average value of the corre- improve data usability. After normalizing, the original sponding IMFs of the M groups as the fnal IMFs. data are converted into a pure dimensionless values. Te training data are transformed into standardized data with c (t) � 􏽘 c (t), (5) j j zero mean and unit variance to better ft and prevent i�1 training divergence. Te standardized formula is as follows: where c (t) is ith decompose of jth IMF. X − X When the EEMD decomposition is completed, the (1) X′ � , original sequence data can be expressed as n IMFs and a s(X) residual. where X′ is the normalized data, X is the original data, X is the mean of the original data, and s(X) is the standard x(t) � 􏽘 c (t) + r (t), (6) j n deviation of the original data. j�1 In the prediction stage, the mean and variance param- eters are denormalized for the predicted data. where c (t), (t � 1, 2, . . . , T) is the jth IMF component decomposed at time t, r (t) is the fnal residual, and n is the 􏽢 􏽢 X � s(X)X′ + X, (2) number of IMFs. 􏽢 􏽢 Te main problem of the EEMD is the high computing where X′ is the predicted value after normalization and X is time and the residue of added noise present in the IMFs. In the fnal predicted value. the EEMD, it can be recognized that every x (t) is decomposed independently from the other realizations, and 3.2. Data Decomposition Method. Te EMD [54] is an the reconstructed signal contains residual noise and diferent adaptive method for the analysis of nonstationary and realizations of signal plus noise that may produce a diferent nonlinear signals. EMD can decompose the original signal number of modes. To overcome this limitation, the into the sum of amplitude and frequency modulation CEEMDAN algorithm was frst proposed by Torres et al. in functions, called “Intrinsic Mode Function” (IMF), and the 2011 [45]. Te main idea of the CEEMDAN is to add white fnal monotonic trend. However, EMD has the problem of noise at each phase of decomposition and calculate a unique “mode mixing,” which is very similar oscillations in diferent residue to obtain each mode. Journal of Advanced Transportation 5 Link vehicle count data Normalization Test data Train data ICEEMDAN Bayesian optimization decomposition Model initialization IMF1 IMF2 IMFn R Gaussian Process Regressor GRU GRU GRU GRU Run the model to calculate the RMSE of F1 F2 Fn Fn+1 validation set Fulfill Mode requirements? reconstruction Denormalization Output (x, y) Forecast results Evaluation results Figure 1: Framework of the link vehicle dynamic count forecasting model. Te resulting dissociation is complemented by a nu- where ⟨·⟩ is the action of averaging throughout the merically negligible error. However, there are still problems realizations. with some residual noise and “spurious” modes. Te ICE- Step 3: Compute the frst mode at the frst stage (k � 1) EMDAN technique is developed to improve the problems as d � x − R . 1 1 with some residual noise and “spurious” modes by Colo- Step 4: Estimate the second residue as the average of minas et al. [46]. local means of the realizations R + β E (w ) and de- 1 2 Given a composite signal x(t), where t is the sampling fne the second mode as follows: sequence of the signal, and let E (·) be the kth IMF obtained by EMD, and defne M(·) as the operator to calculate the d � R − R � R −⟨M􏼐R + β E 􏼐w 􏼑􏼑⟩. (9) 2 1 2 1 1 1 2 local mean of the signal, then, the ICEEMDAN algorithm is described as follows: Step 5: For k � 3, . . ., K; calculate the kth residue: Step 1: Calculate the local means of I realizations using R �⟨M􏼐R + β E 􏼐w 􏼑􏼑⟩, k k−1 k−1 k the EMD algorithm: (10) β � ε std r , k ≥ 1. i i k 0 k x � x + β E 􏼐w 􏼑, i � 1, . . . , I, (7) 0 1 Step 6: Compute the kth mode: where β � ε std(x)/E (w ) and ε is the reciprocal of 0 0 1 0 the desired signal-to-noise ratio between the frst added d � R − R . (11) k k−1 k noise and the analyzed signal. Step 7: Go back to step 4 for the next k. Step 2: Calculate the frst residue R1: Compared with EEMD and CEEMD, the ICEEMDAN R �⟨M􏼐x 􏼑⟩, (8) can not only reduce the noise in the mode but also decrease 6 Journal of Advanced Transportation the residual spurious pattern problems caused by signal Reset Gate overlap, providing an accurate reconstruction of the original signal. h h t-1 t × + 3.3. GRU Model. As a special RNN structure, LSTM solves 1- the problems of vanishing gradient and explosive gradient by changing the cell structure and adding storage cells to r u y t t t determine whether it is necessary to remember information. σ σ tanh GRU [38] improved LSTM by reducing the number of gates to decrease the training time. As shown in Figure 2, the GRU units transfer the input vector x to the output vector h t t Update Gate through time t iteration. GRUs consist of two gates: the reset gate and the update gate. Te main process in a GRU unit Figure 2: Te structure of GRU. can be described as follows: r � σ W x + W h + b 􏼁, t xr t hr t−1 r problem of the unknown objective function. Te model used u � σ W x + W h + b 􏼁, to approximate the objective function is called the surrogate t xu t hu t−1 u (12) model. Te surrogate model commonly used in Bayesian y � tanh􏼐W x + W r ⊙ h 􏼁 + b 􏼑, t xh t hy t t−1 y optimization is the Gaussian process to obtain its posterior distribution. h � 1 − u􏼁 ⊙ h + u ⊙ y , t t t−1 t t Bayesian optimization also applies an acquisition where u and r represent the update and reset gates of the function to direct sampling to an area that may improve the t t GRU, respectively, y means the candidate activation, h current best observation for searching for the next suitable t t represents the current activation, and h is the previous sampling point. Te carefully designed acquisition functions t−1 activation. W , W , W , W , W , W is the corre- balance the exploration of the search space and existing xr hr xu hu xh hy sponding weight parameter matrices; b , b , b is the cor- felds [48]. Te types of acquisition functions include PI r u y responding bias vector; and σ and tanh are the activation (Probability of Improvement) and EI (Expected Improve- functions. For more GRU information, please see the work ment) [50, 60]. of Chung et al. [59]. Te steps of Bayesian optimization of hyperparameter tuning include: 3.4. Bayesian Optimization Hyperparameters. (1) Model initialization. Prepare variables, such as the Hyperparameters are the parameters of the training algo- initial learning rate, the number of hidden units, and rithm itself, not directly learned from the training process. the L2 regularization coefcient. Each model has diferent hyperparameters, and a good (2) Use the Gaussian process to optimize the objective choice of hyperparameters can get the best performance. For function. example, there are four key hyperparameters in the GRU (3) Perform Bayesian optimization and calculate the model, such as the number of hidden units, the learning rate, RMSE of the test set. the number of GRU layers, and the number of hidden units. (4) Check the optimization results. If the results meet the However, manual tuning is inefcient and often afected by requirements, Bayesian optimization will output the human subjective factors. hyperparameters. Otherwise, the Bayesian optimi- Te basic idea of Bayesian optimization is to use Bayes’ zation will be restarted or modifed for the opti- theorem to estimate the posterior distribution of the ob- mization options to continue. jective function based on the data and then select the hyperparameter combination of the following samples according to the distribution. Te Bayesian optimization 4. Experiments and Results algorithm makes full use of the information of the previous sampling points. Te algorithm optimizes by learning the 4.1. Data Description. Te license plate recognition (LPR) shape of the objective function. It will fnd the hyper- data collected in Jinji Road and Airport City Avenue in parameters that maximize the result to the global optimal. Hangzhou, China, are used to verify the proposed model. Bayesian optimization is generally used to minimize the Te selected section of Jinji Road is about 280 m long with objective function f(x) as follows: three northbound lanes without entrance and exit in the middle. Te selected section of Airport Avenue is about x � arg min f(x), (13) 560 m long. Figure 3 shows the location of the license plate x∈χ recognition detector on the Jinji Road. Te data were col- where x is a decision variable, χ is the decision space. In lected from December 1, 2018 to December 24, 2018. Te general, the objective function f(x) is unknown, so it cannot original LPR record data contains the key information, such use gradient descent to solve f(x). Bayesian optimization as license plate of the car, the timestamp of the car passing utilizes a surrogate model to deal with the optimization the detected line, lane number, location information, and Journal of Advanced Transportation 7 Figure 3: Detectors’ location in Jinji road. Data #1 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Data #2 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Figure 4: Link dynamic vehicle count data. other information, such as car type, car color, and car length. improves the quality of input data by decomposing the Ten, the raw LPR data are employed to extract the cor- original data into more regular pattern components. Te responding LDVC data with the 5 min time window by the calculation of the decomposition quantity m is determined by m � fix(log 2(N)) − 1 [56], where N is the length of the cumulative curve model of upstream and downstream ve- hicles [10, 11, 61]. Tus, the model provides 288 LDVC data input data. We used ICEEMDAN to decompose the original points each day. We divide the data into training set and test Data #1 into 11 IMF components and a residual, as illus- set with the ratio of 90 per cent training set and 10 per cent trated in Figure 5. Te periods of these components range test set. Te obtained LDVC data of Jinji Road (Data 1) and from short to long periods and have diferent amplitudes. Airport Avenue (Data 2) are shown in Figure 4. Te Data 1 Te frst subgraph represents the original LDVC data with on the Jinji Road refects the periodic characteristics of the noise and the residual shows the trend term of the LDVC data in days and weeks. Data 1 has more obvious morning data. Te ICEEMDAN algorithm overcomes the mode peak characteristics, while Data 2 has a slightly higher mixing problem of EMD. Taking IMF7 as an example, after evening peak trafc demand. Te proposed model is applied decomposing by ICEEMDAN, the periodic characteristics of to the one-step ahead LDVC forecasting problem. Te length Data #1 in days and weeks are more obvious and more regular. of historical time window is set as 288. Te prediction horizon is set as 1, that is to say, we use one day historical data to predict 5-min LDVC. 4.3. Benchmarks and Measures of Efectiveness. Te real- world link dynamic vehicle data were divided into training 4.2. IMF Components Extraction. Te quality of input data data set and test data set. We selected 12 benchmark models, will afect the prediction performance, and ICEEMDAN including the SVR (support vector regression) [25], RF Link vehicle number Link vehicle number 8 Journal of Advanced Transportation 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -1 0 1000 2000 3000 4000 5000 6000 7000 0.5 -0.5 0 1000 2000 3000 4000 5000 6000 7000 0.2 -0.2 0 1000 2000 3000 4000 5000 6000 7000 4.15 4.1 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Figure 5: IMF components extraction by ICEEMDAN. (random forest) [62], XGBoost (extreme gradient boosting ensemble empirical mode decomposition) [44], ICE- tree regression) [63], GRU-BO, LSTM-BO, EMD-GRU, EMDAN-GRU, and ICEEMDAN-LSTM-BO for compari- son. ICEEMDAN-GRU-BO is the proposed prediction VMD-GRU (variational mode decomposition) [64], MVMD-GRU (multivariate variational mode decomposi- method, which considers the data decomposition and tion) [65], EEMD-GRU, CEEMD-GRU (a complete Bayesian optimization parameters. We choose the baseline Residual IMF11 IMF10 IMF9 IMF8 IMF7 IMF6 IMF5 IMF4 IMF3 IMF2 IMF1 Orignal Journal of Advanced Transportation 9 Table 1: Predictive performance comparison with diferent model for the following reasons: the frst reason is to number of hidden units. compare with commonly used machine learning and deep learning models. Te second reason is to obtain the pre- HS MAE RMSE R diction performance using Bayesian optimization alone. Te Data #1 third reason compares the performance of diferent data 40 0.598 0.968 0.897 decomposition methods. LR � 0.005 80 0.588 0.959 0.899 Tis paper applied 3 evaluation indicators to evaluate the NL � 1 120 0.601 0.975 0.895 model, namely, mean absolute error (MAE), root mean L2 � 0.001 160 0.601 0.969 0.897 square error (RMSE), and coefcient of determination (R ). 200 0.598 0.969 0.897 Te calculation formulas are as follows: Data #2 􏼌 􏼌 40 1.189 1.585 0.862 1 􏼌 􏼌 (i) (i) 􏼌 􏼌 􏼌 􏼌 MAE � 􏽘 x − x 􏽢 , 􏼌 􏼌 LR � 0.005 80 1.171 1.582 0.863 i�1 NL � 1 120 1.171 1.559 0.873 􏽶�������������� � L2 � 0.001 160 1.183 1.587 0.860 n 200 1.185 1.592 0.858 (i) (i) RMSE � x − x 􏽢 , 􏽘 􏼐 􏼑 Te bold values means the best results in the data set. (14) i�1 Table 2: Predictive performance comparison with diferent initial learning rates. n (i) (i) 􏽐 􏼐x − x 􏼑 i�1 R � 1 − , LR MAE RMSE R n (i) 􏽐 􏼐x − x􏼑 i�1 Data #1 0.001 0.609 0.972 0.896 (i) where x is the observed value of link dynamic vehicle 0.003 0.601 0.969 0.897 (i) count, x 􏽢 is the predicted value of vehicle count, and x is the NL � 1 0.005 0.599 0.977 0.895 mean value of vehicle count. HS � 80 0.007 0.586 0.957 0.899 A well-performing deep learning model is inseparable L2 � 0.001 0.010 0.583 0.959 0.899 from an efective hyperparameter tuning process. In the 0.015 0.634 1.011 0.887 following section, we frst analyze the impact of model 0.1 NaN NaN NaN hyperparameters on the prediction performance. Ten, we Data #2 compare the prediction performance of the proposed hybrid 0.001 1.158 1.524 0.871 depth model and the benchmark method. 0.003 1.156 1.516 0.875 NL � 1 0.005 1.203 1.565 0.852 HS � 80 0.007 1.151 1.525 0.871 4.4. Hyperparameter Tuning with Bayesian Optimization. L2 � 0.001 0.010 1.169 1.535 0.866 0.015 1.254 1.676 0.798 Tis section verifes the superiority of the BO in tuning 0.1 NaN NaN NaN hyperparameters by the efect analysis of hyperparameters on the prediction performance. Te hyperparameters of the Te bold values means the best results in the data set. GRU model mainly include the number of hidden units (HS), the initial learning rate (LR), the regularization co- shown in Table 2, when the initial learning rate is set to 0.1, efcient (L2), and the number of GRU layers (NL) [51, 53]. the model cannot converge. In addition, the optimal initial When we compare the prediction performance of one LR for Data #1 is 0.007, and for Data #2 is 0.003, showing hyperparameter, we fx the other hyperparameters. that it is hard to manually tune an appropriate initial LR for Te number of hidden units has a strong infuence on the diferent data. Terefore, it is essential to determine a model. If the number of hidden units is too small, the net will suitable initial learning rate automatically. not learn well, while too many hidden units will afect the Te Bayesian optimization model based on the Gaussian efciency and increase the risk of overftting. Te efects of process can efectively search the candidate hyperparameter hidden layer number on the GRU model are shown in interval and determine the appropriate HS and initial LR, as Table 1. As the HS approaches the optimal value 80, the shown in Figure 6(a). Moreover, there exist interactions prediction accuracy of the model gradually increases. When between diferent hyperparameters. Te superiority of the HS continues to increase, the RMSE and R gradually Bayesian optimization is optimizing multiple hyper- parameters at the same time, as illustrated in Figure 6(b). Te decrease, refecting the possibility of overftting. Te same rule also fts in the Data #2. Furthermore, we could conclude Bayesian optimization can simultaneously optimize the that the optimal number of hidden units for diferent data is initial LR and HS and select the appropriate combination. diferent. In terms of the L2 regularization coefcient, it helps to Ten, we tested the efects of diferent initial learning improve the overftting issues and the generalization level of rates on the GRU model. Te initial learning rates have the model. However, a too large regularization coefcient signifcant efects on the model: a too high learning rate may may lead to the underftting of the model. We manually cause the model to fail to converge; a too small learning rate adjust the L2 regularization coefcient to examine the will cause the model to converge slowly or fail to learn. As prediction performance of the model, as shown in Table 3. Inital Learning Rate 10 Journal of Advanced Transportation 1.35 0.7 1.3 0.65 1.25 0.6 0.55 1.2 0.5 1.15 0.45 1.1 0.4 1.05 0.35 -1 200 -2 -3 0.95 0.9 -3 -2 -1 10 10 10 Observed points Next point Inital Learning Rate Model mean Model minimum feasible Observed points Noise error bars Model mean Next point Model error bars Model minimum feasible (a) (b) Figure 6: Adjusting model hyperparameters with Bayesian optimization. (a) Bayesian optimization adjusts the initial learning rate. (b) Bayesian optimization adjusts the initial learning rate and hidden unit size. Table 3: Infuence of diferent L2 regularization coefcients on the Te results in Table 3 show that diferent regularization model. coefcients impact the model prediction accuracy. Data #1 corresponds to the optimal regularization coefcient equal L2 MAE RMSE R to 0.007 and Data #2 corresponds to the optimal L2 reg- Data #1 ularization coefcient equal to 0.005, showing that the 0.001 0.614 0.989 0.892 model of diferent data suits its own optimal L2 regulari- LR � 0.005 0.003 0.593 0.962 0.898 NL � 1 zation coefcients. 0.005 0.599 0.980 0.894 HS � 80 In Table 4, we look into the impact of multilayer GRU on 0.007 0.582 0.955 0.900 the prediction performance of the model. In a multilayer Data #2 GRU, the number of hidden units in each layer is equally 0.001 1.180 1.599 0.840 distributed. Second, a dropout layer with a dropout prob- LR � 0.00 0.003 1.174 1.585 0.846 NL � 1 ability equal to 0.2 is added after each GRU layer to avoid 0.005 1.173 1.580 0.848 HS � 80 overftting. 0.007 1.183 1.606 0.837 Te results also show RMSE increases as the number of GRU layers increases. Te optimal number of GRU layers corresponding to Data #1 and Data #2 is one layer. It shows that as the number of layers increases, the model may Table 4: Infuence of diferent GRU layers on the model. overft. NL MAE RMSE R In Table 5, we use the Bayesian optimization algorithm to Data #1 optimize the four hyperparameters of the model. Compared with the manually tuning method, Bayesian optimization 1 0.653 0.931 0.905 LR � 0.005 could obtain the best-ftted hyperparameters when all 2 0.657 0.941 0.902 HS � 80/NL 3 0.675 0.946 0.901 evaluation indicator results are optimal. L2 � 0.001 4 0.677 0.950 0.901 Tis section shows the hyperparameters have a signif- Data #2 cant impact on the performance of the GRU model. Te setting of various hyperparameters also presents the phe- 1 1.122 1.603 0.833 LR � 0.005 2 1.129 1.600 0.830 nomenon of trade-ofs. Manual tuning is not only time- HS � 80/NL 3 1.125 1.601 0.831 consuming and labor-intensive but also arduous to obtain L2 � 0.001 4 1.133 1.601 0.831 better results. Bayesian optimization could achieve the best Te bold values means the best results in the data set. performance in tuning hyperparameters. Number of Hidden Units RMSE RMSE Journal of Advanced Transportation 11 Table 5: Efects of Bayesian optimization super parameters on the model. MAE RMSE R Data #1 LR � 6.281e − 04; S � 173 0.580 0.835 0.923 L2 � 1.548e − 05; NL � 1 Data #2 LR � 0.0054; HS � 184 1.089 1.297 0.882 L2 � 0.046; NL � 1 Table 6: Performance comparison of diferent prediction models. Data #1 Data #2 Training time Prediction time Training time Prediction time 2 2 MAE RMSE R MAE RMSE R (min) (s) (min) (s) SVR 0.724 1.048 0.879 3.4 0.02 1.182 1.683 0.842 2.7 0.02 RF 0.683 1.028 0.884 0.5 0.06 1.158 1.661 0.846 0.2 0.02 XGBoost 0.684 1.030 0.883 0.2 0.02 1.158 1.660 0.846 0.1 0.01 GRU-BO 0.586 0.835 0.923 12.3 3.94 1.089 1.297 0.882 11.3 3.47 LSTM-BO 0.528 0.749 0.938 19.2 4.67 1.022 1.390 0.892 17.1 4.32 EMD-GRU 0.537 0.705 0.945 11.4 5.17 0.524 0.712 0.906 9.0 5.80 VMD-GRU 0.383 0.536 0.968 4.8 3.26 0.407 0.550 0.944 4.8 2.85 MVMD-GRU 0.601 0.833 0.924 4.0 2.74 0.531 0.721 0.904 4.4 3.19 EEMD-GRU 0.335 0.457 0.977 9.6 5.48 0.374 0.505 0.953 8.9 5.69 CEEMD-GRU 0.322 0.453 0.977 9.7 5.68 0.363 0.510 0.952 9.1 5.61 ICEEMDAN-GRU 0.257 0.352 0.986 10.4 6.74 0.289 0.420 0.967 10.8 5.86 ICEEMDAN- 0.217 0.307 0.990 221.4 7.47 0.223 0.311 0.982 201.2 6.14 LSTM-BO ICEEMDAN-GRU- 0.203 0.298 0.990 146.0 6.13 0.233 0.324 0.981 136.6 6.43 BO 4.5. Benchmarks and Model Comparisons. Tis part studies Table 6 shows the performance comparison of diferent the efect of data decomposition and Bayesian optimization models and Figure 7 shows the prediction error box plot. Te on model performance. We compared the performance of 12 results of Data #1 show that the performance of the ICE- benchmark models with the proposed ICEEMDAN-GRU- EMDAN-GRU-BO model is the best. Te evaluation indi- BO model. We used a computer with a 2.9 Ghz, dual-core cators of the proposed model are similar to the ICEEMDAN- processor and 8G memory in our experiments. LSTM-BO model in Data #2. However, the training time of Te parameters setting of the models are set as follows. the proposed model is drastically reduced. XGBoost and RF show similar predictive performance, Te parameters of SVR, RF, and XGBoost are optimized by the grid search method. According to the results of grid and both perform better than SVR. By comparing with search, the kernel coefcient, regularization parameter, and XGBoost and RF models, the GRU-BO and the LSTM-BO epsilon are three essential parameters in the SVR model, models improves prediction accuracy. By comparing with which are set as 288.83, 29.31, and 0.0017 in Data #1 and GRU-BO, EMD-GRU achieves better prediction accuracy 60.2, 775.97, and 0.0061 in Data #2, respectively. Te number due to the efect of EMD, which shows that data decom- of trees and max depth are two essential parameters in the position obtains a better outcome. RF model, which are set as 138 and 15 in Data #1 and 22, 9 in Te efect of diferent data decomposition on model Data #2, respectively. Te number of estimators, learning performance shows that, compared with EMD, VMD, rate, and max depth are three key parameters in XGBoost, MVDM, EEMD, and CEEMD, the ICEEMDAN data de- which are set as 117 and 0.3, 6 in Data #1 and 102 and 0.4, 5 composition has the most signifcant improvement in model prediction. in Data #2, respectively. Te hyperparameters of the GRU model such as the number of hidden units, the initial Although Bayesian optimization increases the training learning rate, the regularization coefcient, and the number time of ICEEMDAN-GRU-BO, the training time is within of GRU layers are tuning with Bayesian optimization. an acceptable range for the improvement of prediction According to the results of section 4.4, we set the initial performance. learning rate to 0.005, the number of hidden units to 80, the Bayesian optimization efectively and simultaneously L2 regularization coefcient to 0.001, and the number of optimizes the four hyperparameters of the model; GRU layers to one. avoiding manual tuning that only relies on the empirical 12 Journal of Advanced Transportation -5 -5 Figure 7: Box plots of prediction errors of diferent models. methods. In addition, with the development of com- and BO all contribute to the improvement of prediction puting technology, such as the technology represented by accuracy and efciency. Combining ICEEMDAN-GRU- cloud computing, higher-performance computing re- BO can obtain the best performance and least calculation sources will become cheaper and more convenient, and complexity. the training time of the model will not be a limiting Te prediction model is based on the ICEEMDAN-GRU- factor. BO for link dynamic vehicle count. It is necessary to establish diferent prediction models for diferent road section sce- narios. In the future, we will further explore whether we can 5. Conclusions and Discussion make full use of the training information of the previous In this paper, we propose a hybrid deep learning model that models and train the deep learning prediction model by combines the data decomposition method, ICEEMDAN, transfer learning to improve the modeling efciency and and GRU model with Bayesian optimization for link dy- prediction accuracy. Besides, LDVC is a crucial parameter in namic vehicle count forecasting. ICEEMDAN is used to developing efcient adaptive trafc signal controllers as process and decompose the original data into specifc IMFs. trafc-responsive control systems require reliable real-time Considering the nonlinear characteristics of IMFs of link demand information on prevailing trafc conditions to make dynamic vehicle data, GRU is used as the basic forecast sensible control decisions. Te proposed model can be used to model of each IMF. BO is adapted to auto-tune hyper- predict the basic inputs for real-time control applications in parameters of the deep learning approach. Te presented urban areas and further extended to network-wide vehicle model is compared with 12 benchmark models including counts forecasting for network-wide trafc signal control SVR, RF, XGBoost, GRU-BO, LSTM-BO, EMD-GRU, optimization in urban trafc management. VMD-GRU, MVMD-GRU, EEMD-GRU, CEEMD-GRU, ICEEMDAN-GRU, and ICEEMD-LSTM-BO. Models are Data Availability validated with the real-world data collected in Hangzhou, China. Tree evaluation indicators (MAE, RMSE, and R ), Te link vehicle count data used to support the fndings of training time, and prediction time are used to measure the this study are available from the corresponding author upon performance. Results indicated that ICEEMDAN, GRU, request. Error Error SVR SVR RF RF XGBoost XGBoost GRU-BO GRU-BO LSTM-BO LSTM-BO EMD-GRU EMD-GRU VMD-GRU VMD-GRU MVMD-GRU MVMD-GRU EEMD-GRU EEMD-GRU CEEMD-GRU CEEMD-GRU ICEEMDAN-GRU ICEEMDAN-GRU ICEEMDAN-LSTM-BO ICEEMDAN-LSTM-BO ICEEMDAN-GRU-BO ICEEMDAN-GRU-BO Journal of Advanced Transportation 13 [12] M. Van Der Voort, M. Dougherty, and S. Watson, “Com- Conflicts of Interest bining kohonen maps with arima time series models to Te authors declare that there are no conficts of interest forecast trafc fow,” Transportation Research Part C: regarding the publication of this article. Emerging Technologies, vol. 4, no. 5, pp. 307–318, 1996. [13] B. M. Williams, P. K. Durvasula, and D. E. Brown, “Urban freeway trafc fow prediction: application of seasonal Acknowledgments autoregressive integrated moving average and exponential smoothing models,” Transportation Research Record, Tis research study was fnancially supported by National vol. 1644, no. 1, pp. 132–141, 1998. Natural Science Foundation of China under Grant nos. [14] K. Kumar, M. Parida, and V. K. Katiyar, “Short term trafc 52131202, 71901193, and 52072340; National Key Research fow prediction in heterogeneous condition using artifcial and Development Program of China under Grant no. neural network,” Transport, vol. 30, no. 4, pp. 397–405, 2013. 2019YFB1600303; and China Postdoctoral Science Foun- [15] H. Zheng, F. Lin, X. Feng, and Y. Chen, “A hybrid deep dation under Grant no. 2020M671724. learning model with attention-based conv-LSTM networks for short-term trafc fow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 11, pp. 6910– References 6920, 2021. [16] J. Wang and Q. Shi, “Short-term trafc speed forecasting [1] C. Diakaki, M. Papageorgiou, and K. Aboudolas, “A multi- hybrid model based on Chaos–Wavelet Analysis-Support variable regulator approach to trafc-responsive network- Vector Machine theory,” Transportation Research Part C: wide signal control,” Control Engineering Practice, vol. 10, no. 2, pp. 183–195, 2002. Emerging Technologies, vol. 27, pp. 219–232, 2013. [17] L. Zhou, S. Zhang, J. Yu, and X. Chen, “Spatial–temporal deep [2] K. M. A. E. Aboudolas, M. Papageorgiou, A. Kouvelas, and E. Kosmatopoulos, “A rolling-horizon quadratic-program- tensor neural networks for large-scale urban network speed ming approach to the signal control problem in large-scale prediction,” IEEE Transactions on Intelligent Transportation congested urban road networks,” Transportation Research Systems, vol. 21, no. 9, pp. 3718–3729, 2020. Part C: Emerging Technologies, vol. 18, no. 5, pp. 680–694, [18] Y. Chen, C. Tao, Q. Bai, F. Liu, X. Qi, and R. Zhuo, “Short- term speed prediction for expressway considering adaptive [3] S. Chen and D. J. Sun, “An improved adaptive signal control selection of spatiotemporal dimensions and similar trafc method for isolated signalized intersection based on dynamic features,” Journal of Transportation Engineering, Part A: programming,” IEEE Intelligent Transportation Systems Systems, vol. 146, no. 10, Article ID 04020114, 2020. Magazine, vol. 8, no. 4, pp. 4–14, 2016. [19] J. W. C. Van Lint, S. P. Hooqendoorn, and H. J. Van Zuvlen, [4] G. Vigos, M. Papageorgiou, and Y. Wang, “Real-time esti- “Freeway travel time prediction with state-space neural net- mation of vehicle-count within signalized links,” Trans- works modeling state-space dynamics with recurrent neural portation Research Part C: Emerging Technologies, vol. 16, networks,” Transportation Research Record, vol. 1811, no. 1, pp. 18–35, 2008. pp. 30–39, 2002. [5] K. Kwong, R. Kavaler, R. Rajagopal, and P. Varaiya, “Real- [20] C.-H. Wu, J. M. Ho, and D. T. Lee, “Travel-time prediction time measurement of link vehicle count and travel time in a with support vector regression,” IEEE Transactions on In- road network,” IEEE Transactions on Intelligent Trans- telligent Transportation Systems, vol. 5, no. 4, pp. 276–281, portation Systems, vol. 11, no. 4, pp. 814–825, 2010. [6] G. Vigos and M. Papageorgiou, “A simplifed estimation [21] J. van Lint, S. Hoogendoorn, and H. van Zuylen, “Accurate scheme for the number of vehicles in signalized links,” IEEE freeway travel time prediction with state-space neural net- Transactions on Intelligent Transportation Systems, vol. 11, works under missing data,” Transportation Research Part C: no. 2, pp. 312–321, 2010. Emerging Technologies, vol. 13, no. 5-6, pp. 347–369, 2005. [7] M. Papageorgiou and G. Vigos, “Relating time-occupancy [22] S. V. Kumar and L. Vanajakshi, “Short-term trafc fow measurements to space-occupancy and link vehicle-count,” prediction using seasonal ARIMA model with limited input Transportation Research Part C: Emerging Technologies, data,” European Transport Research Review, vol. 7, no. 3, p. 21, vol. 16, no. 1, pp. 1–17, 2008. [8] M. Rostami Shahrbabaki, A. A. Safavi, M. Papageorgiou, and [23] G. A. Davis and N. L. Nihan, “Nonparametric regression and I. Papamichail, “A data fusion approach for real-time trafc short-term freeway trafc forecasting,” Journal of Trans- state estimation in urban signalized links,” Transportation portation Engineering, vol. 117, no. 2, pp. 178–188, 1991. Research Part C: Emerging Technologies, vol. 92, pp. 525–548, [24] B. L. Smith and M. J. Demetsky, “Short-term trafc fow prediction models-a comparison of neural network and [9] S. Lin, B. De Schutter, Y. Xi, and H. Hellendoorn, “Efcient nonparametric regression approaches,” in Proceedings of the network-wide model-based predictive control for urban IEEE international conference on systems, man and trafc networks,” Transportation Research Part C: Emerging cybernetics, San Antonio, TX, USA, October 1994. Technologies, vol. 24, pp. 122–140, 2012. [25] H. Drucker, C. J. C. Burges, L. Kaufman, S. Alex, and [10] X. Zhan, R. Li, and S. V. Ukkusuri, “Lane-based real-time V. Vapnik, “Support vector regression machines,” Advances queue length estimation using license plate recognition data,” in Neural Information Processing Systems, vol. 9, pp. 155–161, Transportation Research Part C: Emerging Technologies, vol. 57, pp. 85–102, 2015. [26] P. Cai, Y. Wang, G. Lu, P. Chen, C. Ding, and J. Sun, “A [11] C. He, D. Wang, M. Chen, G. Qian, and Z. Cai, “Link dynamic vehicle count estimation based on travel time distribution spatiotemporal correlative k-nearest neighbor model for short-term trafc multistep forecasting,” Transportation Re- using license plate recognition data,” Transportmetrica: Transportation Science, pp. 1–22, 2021. search Part C: Emerging Technologies, vol. 62, pp. 21–34, 2016. 14 Journal of Advanced Transportation [27] X. Feng, X. Ling, H. Zheng, Z. Chen, and Y. Xu, “Adaptive [42] H. F. Yang and Y. P. P. Chen, “Hybrid deep learning and multi-kernel SVM with spatial–temporal correlation for empirical mode decomposition model for time series appli- short-term trafc fow prediction,” IEEE Transactions on cations,” Expert Systems with Applications, vol. 120, Intelligent Transportation Systems, vol. 20, no. 6, pp. 2001– pp. 128–138, 2019. 2013, 2019. [43] X. Jiang, L. Zhang, and X. Michael Chen, “Short-term fore- [28] Y. Hua, Z. Zhao, R. Li, X. Chen, Z. Liu, and H. Zhang, “Deep casting of high-speed rail demand: a hybrid approach com- learning with long short-term memory for time series pre- bining ensemble empirical mode decomposition and gray diction,” IEEE Communications Magazine, vol. 57, no. 6, support vector machine with real-world applications in pp. 114–119, 2019. China,” Transportation Research Part C: Emerging Technol- [29] Y. S. Jeong, Y. J. Byon, M. M. Castro-Neto, and S. M. Easa, ogies, vol. 44, pp. 110–127, 2014. “Supervised weighting-online learning algorithm for short- [44] J.-R. Yeh, J. S. Shieh, and N. E. Huang, “Complementary term trafc fow prediction,” IEEE Transactions on Intelligent ensemble empirical mode decomposition: a novel noise en- Transportation Systems, vol. 14, no. 4, pp. 1700–1707, 2013. hanced data analysis method,” Advances in Adaptive Data [30] W.-C. Hong, Y. Dong, F. Zheng, and S. Y. Wei, “Hybrid Analysis, vol. 02, no. 02, pp. 135–156, 2010. evolutionary algorithms in a SVR trafc fow forecasting [45] M. E. Torres, M. A. Colominas, G. Schlotthauer, and model,” Applied Mathematics and Computation, vol. 217, P. Flandrin, “A complete ensemble empirical mode decom- no. 15, pp. 6733–6747, 2011. position with adaptive noise,” in Proceedings of the 2011 IEEE [31] G. Leshem and Y. Ritov, “Trafc fow prediction using ada- international conference on acoustics, speech and signal pro- boost algorithm with random forests as a weak learner,” cessing (ICASSP), Prague, Czech Republic, May 2011. International Journal of Mathematical and Computational [46] M. A. Colominas, G. Schlotthauer, and M. E. Torres, “Im- Sciences, vol. 1, no. 1, pp. 1–6, 2007. proved complete ensemble EMD: a suitable tool for bio- [32] H. Chen, S. Grant-Muller, L. Mussone, and F. Montgomery, medical signal processing,” Biomedical Signal Processing and “A study of hybrid neural network approaches and the efects Control, vol. 14, pp. 19–29, 2014. of missing data on trafc forecasting,” Neural Computing & [47] F. Hutter, J. Lucke, ¨ and L. Schmidt-Tieme, “Beyond manual Applications, vol. 10, no. 3, pp. 277–286, 2001. tuning of hyperparameters,” KI-Kunstliche ¨ Intelligenz, vol. 29, [33] Yu Wei and Mu-C. Chen, “Forecasting the short-term metro no. 4, pp. 329–337, 2015. passenger fow with empirical mode decomposition and [48] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and neural networks,” Transportation Research Part C: Emerging N. de Freitas, “Taking the human out of the loop: a review of Technologies, vol. 21, no. 1, pp. 148–162, 2012. bayesian optimization,” Proceedings of the IEEE, vol. 104, [34] J. J. Ruiz-Aguilar, I. J. Turias, and M. J. Jimenez-Come, ´ no. 1, pp. 148–175, 2016. “Hybrid approaches based on SARIMA and artifcial neural [49] J. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms networks for inspection time series forecasting,” Trans- for hyper-parameter optimization,” in Proceedings of the 25th portation Research Part E: Logistics and Transportation Re- annual conference on neural information processing systems view, vol. 67, pp. 1–13, 2014. (NIPS 2011), Granada, Spain, January 2011. [35] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short- [50] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian term memory neural network for trafc speed prediction optimization of machine learning algorithms,” Advances in using remote microwave sensor data,” Transportation Re- Neural Information Processing Systems, vol. 25, 2012. search Part C: Emerging Technologies, vol. 54, pp. 187–197, [51] H. Cheng, X. Ding, W. Zhou, and R. Ding, “A hybrid elec- 2015. tricity price forecasting model with Bayesian optimization for [36] Z. Zhao, W. H. Chen, X. M. Wu, P. C. Y. Chen, and J. M. Liu, German energy exchange,” International Journal of Electrical “LSTM network: a deep learning approach for short-term Power & Energy Systems, vol. 110, pp. 653–666, 2019. trafc forecast,” IET Intelligent Transport Systems, vol. 11, [52] F. He, J. Zhou, Z.-K. Feng, G. Liu, and Y. Yang, “A hybrid short-term load forecasting model based on variational mode no. 2, pp. 68–75, 2017. [37] D. Yang, K. Chen, M. Yang, and X. Zhao, “Urban rail transit decomposition and long short-term memory networks con- passenger fow forecast based on LSTM with enhanced long- sidering relevant factors with Bayesian optimization algo- term features,” IET Intelligent Transport Systems, vol. 13, rithm,” Applied Energy, vol. 237, pp. 103–116, 2019. [53] H. Yi and K. H. N. Bui, “An automated hyperparameter no. 10, pp. 1475–1482, 2019. [38] K. Cho, B. Van Merrienboer, ¨ C. Gulcehre et al., “Learning search-based deep learning model for highway trafc pre- phrase representations using RNN encoder-decoder for sta- diction,” IEEE Transactions on Intelligent Transportation tistical machine translation,” 2014, https://arxiv.org/abs/1406. Systems, vol. 22, no. 9, pp. 5486–5495, 2021. [54] N. E. Huang, Z. Shen, S. R. Long et al., “Te empirical mode [39] K. Zhang, L. Zheng, Z. Liu, and N. Jia, “A deep learning based decomposition and the Hilbert spectrum for nonlinear and multitask model for network-wide trafc speed prediction,” non-stationary time series analysis,” Proceedings of the Royal Neurocomputing, vol. 396, pp. 438–450, 2020. Society of London. Series A: Mathematical, Physical and En- [40] S. Zhang, L. Zhou, X. M. Chen, L. Zhang, L. Li, and M. Li, gineering Sciences, vol. 454, no. 1971, pp. 903–995, 1998. “Network-wide trafc speed forecasting: 3D convolutional [55] N. E. Huang, M. L. C. Wu, S. R. Long et al., “A confdence neural network with ensemble empirical mode decomposi- limit for the empirical mode decomposition and hilbert tion,” Computer-Aided Civil and Infrastructure Engineering, spectral analysis,” Proceedings of the Royal Society of London. vol. 35, no. 10, pp. 1132–1147, 2020. Series A: Mathematical, Physical and Engineering Sciences, [41] L. Li, X. Qu, J. Zhang, H. Li, and B. Ran, “Travel time pre- vol. 459, pp. 2317–2345, 2003. diction for highway network based on the ensemble empirical [56] Z. Wu and N. E. Huang, “Ensemble empirical mode de- mode decomposition and random vector functional link composition: a noise-assisted data analysis method,” Ad- network,” Applied Soft Computing, vol. 73, pp. 921–932, 2018. vances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1–41, 2009. Journal of Advanced Transportation 15 [57] Y. X. Wu, Q.-B. Wu, and J.-Q. Zhu, “Improved EEMD-based crude oil price forecasting using LSTM networks,” Physica A: Statistical Mechanics and Its Applications, vol. 516, pp. 114– 124, 2019. [58] Y. Shrivastava and B. Singh, “A comparative study of EMD and EEMD approaches for identifying chatter frequency in CNC turning,” European Journal of Mechanics - A: Solids, vol. 73, pp. 381–393, 2019. [59] J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, https://arxiv.org/abs/1412.3555. [60] M. Pelikan, D. E. Goldberg, and E. Cantu-Paz, ´ “BOA: the Bayesian optimization algorithm,” in Proceedings of the ge- netic and evolutionary computation conference GECCO-99, Orlando, Florida, USA, July 1999. [61] A. Bhaskar, T. Tsubota, L. M. Kieu, and E. Chung, “Urban trafc state estimation: fusing point and zone based data,” Transportation Research Part C: Emerging Technologies, vol. 48, pp. 120–142, 2014. [62] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. [63] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 2016. [64] K. Dragomiretskiy and D. Zosso, “Variational mode de- composition,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531–544, 2014. [65] N. U. Rehman and H. Aftab, “Multivariate variational mode decomposition,” IEEE Transactions on Signal Processing, vol. 67, no. 23, pp. 6039–6052, 2019. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Advanced Transportation Hindawi Publishing Corporation

A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization

Loading next page...
 
/lp/hindawi-publishing-corporation/a-hybrid-deep-learning-model-for-link-dynamic-vehicle-count-t4nCVV00D0

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
ISSN
0197-6729
eISSN
2042-3195
DOI
10.1155/2023/5070504
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Advanced Transportation Volume 2023, Article ID 5070504, 15 pages https://doi.org/10.1155/2023/5070504 Research Article A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization 1,2 1 1,3 1 Chunguang He , Dianhai Wang , Yi Yu , and Zhengyi Cai College of Civil Engineering and Architecture, Zhejiang University, Hangzhou, China School of Transportation and Logistics Engineering, Xinjiang Agricultural University, Urumqi, China Shanghai AI Laboratory, Shanghai 200232, China Correspondence should be addressed to Zhengyi Cai; caizhengyi@zju.edu.cn Received 30 March 2022; Revised 7 September 2022; Accepted 24 November 2022; Published 7 February 2023 Academic Editor: Young-Jae Lee Copyright © 2023 Chunguang He et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Te link dynamic vehicle count is a spatial variable that measures the trafc state of road sections, which refects the actual trafc demand. Tis paper presents a hybrid deep learning method that combines the gated recurrent unit (GRU) neural network model with automatic hyperparameter tuning based on Bayesian optimization (BO) and the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) model. Tere are four steps in this hybrid approach. First, the ICE- EMDAN is employed to decompose the link dynamic vehicle count time series data into several intrinsic components. Second, the components are predicted by the GRU model. At the same time, the Bayesian optimization method is utilized to automatically optimize the hyperparameters of the GRU model. Finally, the predicted subcomponents are reconstructed to obtain the fnal prediction results. Te proposed hybrid deep learning method is tested on two roads of Hangzhou, China. Results show that, compared with the 12 benchmark models, the proposed hybrid deep learning model achieves the best performance in link dynamic vehicle count forecasting. on a specifc road that describes the space occupancy rate of 1. Introduction roads [4–6] and can refect trafc demand more precisely With the development of the social economy and urbani- [7, 8]. Accurate and real-time prediction of the link dynamic zation, travellers’ trafc demands increased rapidly. Trafc vehicle count can also provide a reliable basis for the net- problems such as trafc congestion, environmental pollu- work-wide trafc signal control strategy and optimization tion, and economic losses have brought challenges to urban [1, 2, 9]. In recent years, thanks to continuous investments in transportation management. Intelligent transportation intelligent transportation systems, various sensors have been technology promises to deal with these problems, and ac- deployed and a large amount of real-time trafc data can be curate and efective forecasting of trafc demands is a key collected. Te link dynamic vehicle count data are efectively calculated under the new data environment with mature step. Traditionally, indicators such as trafc fow are used to represent trafc demands. While in seriously congested technology [10, 11]. However, the critical issues and chal- areas, those indicators can hardly refect actual trafc de- lenges unaddressed in forecasting link dynamic vehicle mands [1]. Trafc signal control systems, such as SCOOT count in the following aspects: (a) LDVC is often disturbed (Split Cycle Ofset Optimization Technique) and SCATS by stochastic factors. For instance, LDVC experiences a (Sydney Coordinated Adaptive Trafc System), failed to sudden increase and decrease when trafc fow becomes work properly due to the inaccurate trafc demand esti- congested or trafc incidents occur, such as accidents and mation in the congested period [2, 3]. temporary trafc control measures. (b) Although selecting Compared with the indicator of trafc fow, the link the “best” model among a set of baselines is signifcant, a dynamic vehicle count (LDVC) refers to the vehicle number better alternative is to consider the strength and robustness 2 Journal of Advanced Transportation consuming and laborious in hyperparameter tuning, the of the prediction results. By decomposing trafc data into subsequences, it can help capture both the common ten- hyperparameters in our model are automatically and ef- ciently tuned using BO. (4) Te test results show that the dency and some changes in trafc fow to improve the prediction accuracy. (c) Most machine learning-based proposed hybrid deep learning framework achieved the best methods can mine the nonlinear profle of link vehicle count, performance in the aspect of improvement in prediction but overftting issues may occur. It indicates that the model performance and reduction in training time compared with extracts noise in the training data as a feature of the data a variety of benchmark models. Considering the LDVC is itself, which in turn degrades performance in the test dataset. one of the best inputs for real-time control applications in Machine learning approaches are widely used in trafc urban areas, the proposed model could provide the accurate real-time LDVC forecast data for real-time trafc control. forecasting. Researchers obtained some progress in pre- diction algorithms, model fusion, and temporal and spatial Te rest of the paper is organized as follows: Section 2 reviews the relevant literature about trafc prediction model, characteristics of trafc data. However, there are still some crucial issues and challenges. (a) In trafc research feld, hybrid model, and hyperparameter optimization. Section 3 proposes the framework of the prediction model, including most forecasting models focus on trafc fow [12–15], trafc speed [16–18], or travel time [19–21]. Few forecasting the logical relationship between data decomposition, model models focus on the prediction of the LDVC, let alone training, parameter optimization, and prediction compo- considering the data characteristics of the LDVC in the nent reconstruction. Te fundamental methods are prediction model. (b) Te LDVC has nonlinear and sto- explained, including the sequence data decomposition chastic characteristics, such as the random change due to method ICEEMDAN, the basic model GRU, and the trafc congestion and weather factors. Simultaneously, due hyperparameter tuning algorithm BO. Section 4 validates the proposed model using actual link dynamic vehicle count to the signal light control, weekday commuting, the LDVC shows the characteristics of long short-term periodic data collected in Hangzhou, China. We further study the infuence of hyperparameter tuning and data decomposition changes in signal cycle, day, and week pattern. (c) Te excellent performance of deep learning methods in trafc on model performance. Besides, we compare the prediction accuracy and computing time of our model with 12 forecasting is inseparable from the model hyperparameter tuning efciently and appropriately. Te prediction model benchmark models. Section 5 concludes the paper. represented by deep learning requires many hyperparameter tuning, which is time consuming and laborious but hard to 2. Literature Review obtain suitable hyperparameters. In response to these challenges, this paper proposes a In the past few decades, there has been a lot of research on hybrid ICEEMDAN-GRU-BO forecasting model for the link trafc system forecasting. Tis paper focus on the trafc dynamic vehicle count. Te hybrid model fuses the ICE- prediction model, hybrid model, and hyperparameter op- EMDAN based on hyperparameter tuning with Bayesian timization. Te literature review of the three research lines is optimization (BO) and the improved complete ensemble summarized as follows. empirical mode decomposition with adaptive noise (ICE- Tis section frst reviews the literature on trafc infor- EMDAN). First, the ICEEMDAN is utilized to decompose mation prediction models, such as parametric models the link dynamic vehicle count data into subcomponents. [12, 13, 22] and nonparametric models [20, 23–28]. Tus, the decomposed components reduce the stochastic A parametric model mainly considers some unsteady characteristics and become more regular and suitable for time series data to establish prediction models with limited prediction. Second, the GRU models are applied to predict parameters. Autoregressive integrated moving average those components by considering the feature of long short- (ARIMA) or seasonal autoregressive integrated moving term periods. Simultaneously, the BO is used to automati- average (SARIMA) is commonly used in time series data cally optimize the hyperparameters of the GRU models to forecasting in transportation. For example, Williams et al. deal with the challenge problem of the hyperparameter developed a SARIMA model to identify seasonal patterns to tuning. Finally, the predicted subcomponents are recon- capture periodic changes in trafc states [13]. Van Der Voort structed to obtain the fnal prediction results. et al. used a self-organizing neural network graph as the Te contributions of this paper can be summarized as initial classifer associated with an individually ARIMA follows: (1) For the frst time, a hybrid ensemble decom- model to predict the half hour trafc fow on French position deep learning prediction framework is proposed to highways [12]. Kumar et al. selected a three-lane arterial predict the link dynamic vehicle count to improve the road with limited three days of trafc fow data in Chennai, prediction accuracy and reduce the prediction time. (2) India, to establish a SARIMA model for trafc fow fore- Aiming at tackling the challenges mentioned above, we casting [14]. present a novel approach by integrating the GRU with Te nonparametric models mainly include machine ICEEMDAN. Te data decomposition method ICEEMDAN learning and deep learning approaches. Machine learning is used to reveal the nonlinearity and stochastic character- has been widely used to predict trafc information. For istic of the link dynamic vehicle count. We propose a short- example, researchers used the k-nearest neighbor (KNN) term prediction method based on GRU model to efectively model to predict short-term trafc fow [23, 26, 29]. Support capture the long short-term period features. (3) To avoid vector regression (SVR) was utilized in trafc fow predic- overftting issues and address the considerable time tion [27, 29, 30] and travel time forecasting [20]. Random Journal of Advanced Transportation 3 and Chen combined the EMD and back propagation neural forest regression (RF) was applied for trafc fow prediction [31]. Early neural network modeling such as multilayer network (BPN) model to predict short-term passenger fow in the subway system [33]. Yang and Chen combined EMD perceptron (MLP) [24, 32], back propagation neural net- work (BPN) [33], and artifcial neutral networks (ANN) and stacked autoencoder (SAE) for passenger fow predic- were widely employed in the prediction of trafc systems. tion in urban rail transit [42]. Jiang et al. combined the For example, Kumar et al. operated ANN for the short-term EEMD and grey support vector machine model to develop a prediction of trafc volume [14]. Ruiz Aguilar et al. proposed hybrid short-term demand forecasting method for short- a hybrid prediction method based on the combination of the term high-speed rail passenger fow forecasting [43]. Te ARIMA and ANN models to predict the number of goods particle swarm optimization algorithm was used to optimize the grey support vector machine, and the results show that inspected at European border checkpoints [34]. However, the traditional artifcial neural network cannot the model performs well in terms of prediction accuracy. Zhang et al. proposed a hybrid deep learning prediction capture time-series data features because it does not consider time dependence. To overcome this shortcoming, re- model that combined 3D convolutional neural network (3D CNN) and EEMD to predict the network-wide speed of searchers have explored a large number of novel neural network models. Te deep learning models are the fastest Beijing, and the results showed that the EEMD method growing algorithms in recent years. In terms of sequence efectively improves the input data, and 3D CNN can data modeling, RNN (Recurrent Neural Network) is one of consider the temporal and spatial characteristics of the road the representatives. Van Lint et al. proposed a nonlinear network [40]. As advantages in data decomposition, other state space method using RNN to predict short-term improved algorithms were proposed based on EMD and highway travel time [19]. EEMD, such as complementary EEMD (CEEMD) [44], a complete EEMD with adaptive noise (CEEMDAN) [45], and RNN’s variant LSTM (Long-Short Term Memory) solves the shortcoming that RNN cannot store long-term memory of ICEEMDAN [46]. Te excellent performance of the neural network pre- information. LSTM has successful applications in trafc system prediction. For example, Ma et al. wielded remote microwave diction model is inseparable from the parameter optimi- zation. Te following literature reviews focus on the relevant sensor data to establish LSTM models to predict trafc speed [35]. Zhao et al. proposed an LSTM prediction model for short- parameter optimization methods. Suitable parameter setting term trafc fow prediction [36]. Yang et al. applied LSTM to shows an enormous impact on the performance of the neural predict urban rail transit passenger fow [37]. network prediction model [43]. For parameter optimization As an improved algorithm of LSTM, GRU [38] was frst in machine learning and deep learning models, manual proposed by Cho et al. in 2014. In most cases, the prediction tuning relies on experience and vulnerable to bias, and the performance of GRU is similar to LSTM, but the training tuning process is very time-consuming [47, 48]. Commonly used automatic parameter tuning algorithms [49] include time is reduced. Zhang et al. predict network-wide trafc speed with a deep learning model, and the results show that grid search, random search, and Bayesian optimization. Grid and random search have shortcomings, in which the new GRU obtains even better performance than LSTM [39]. Te hybrid model that combines data decomposition search may separate from the previous search information and machine learning or deep learning approach can ef- and cannot make full use of prior knowledge. Bayesian ciently improve the prediction performance. Te following optimization utilizes the prior distribution information of part reviews the literature on hybrid models that combine parameters [50]. It can auto efectively search for hyper- the data decomposition methods, machine learning, and parameters with fewer iteration steps. Bayesian optimization deep learning methods. Both machine learning and deep has become the most practical tool for parameter optimi- learning models require stable inputs, and data decompo- zation in predictive systems, which is successfully applied in sition methods can efectively improve the quality of model the deep learning model hyperparameter tuning recently, such as references [51–53]. input data and make the decomposed data more regular [40]. Choosing a reliable data sequence decomposition In response to these challenges, we propose a hybrid deep learning model for link dynamic vehicle count fore- method is critical for the stable and efective input required in the forecasting model. Traditional data decomposition casting. Te data decomposition method ICEEMDAN is methods, such as wavelet transform (WT), have been suc- adopted to decompose the irregular trafc demand data to cessfully applied in transportation. For example, Wang and more simple IMFs components. Te GRU model is used to Shi established a short-term trafc speed prediction model predict the IMFs’ components considering long- and short- based on chaotic wavelet transform and support vector term periodic characteristics of trafc demand, and Bayesian machine [16]. However, the traditional wavelet transforms optimization is utilized for automatically tuning multiple hyperparameters of the deep learning models. and Fourier transform techniques have disadvantages. For example, it is difcult to choose the mother wavelet, while empirical mode decomposition (EMD) and ensemble em- 3. Link Dynamic Vehicle Count pirical mode decomposition (EEMD) are more efective. Te Forecasting Model decomposition method can decompose the data into in- trinsic mode components. Many researchers have greatly Tis paper proposes a hybrid deep learning model that improved the accuracy of the prediction model based on the combines ICEEMDAN and GRU with Bayesian optimiza- decomposition of EMD and EEMD [41]. For example, Wei tion for link dynamic vehicle count forecasting, called 4 Journal of Advanced Transportation ICEEMDAN-GRU-BO. Figure 1 shows the framework and IMFs. Mode mixing reduces the EMD’s ability to recognize the main steps are as follows: diferent amplitudes in the actual data of the IMF compo- nents and afects the prediction accuracy of the hybrid model (1) Data processing, including data cleaning, normali- [43]. To overcome the problem of mode mixing, researchers zation, and completion. For example, we fll in the have proposed a new data decomposition method that adds missing data according to the average value of the Gaussian white noise, called “Ensemble Empirical Mode previous three steps. Decomposition” (EEMD) [55–58]. EEMD is a noise-assisted (2) Data decomposition. ICEEMDAN is adopted to data analysis method that aims to overcome the short- decompose the link dynamic vehicle count into comings of the EMD method. Te steps of EEMD are as several intrinsic mode functions (IMFs) and a re- follows: sidual. Tese mode components are simpler and Step (1): Before EMD decomposition, Gaussian white more regular, which can improve the accuracy of the noise is added to the original sequence data each time, deep learning model. and the construction sequence after addition is as (3) Subcomponents prediction and hyperparameters follows: optimization. GRU is used to predict subcompo- i i x (t) � x(t) + w (t), (3) nents of diferent frequencies as the basic prediction model. Tis framework employs a Bayesian opti- where x (t) is the construction sequence data, x(t) is mization algorithm to optimize the hyperparameters i 2 the original sequence data, and w (t) ∼ N(0, σ ) is the of each GRU model. Tese hyperparameters include added white noise sequence data. the initial learning rate, number of hidden units, L2 Step (2): EMD is adopted to decompose the con- regularization coefcient, and number of GRU struction sequence x (t) into n IMFs. layers. (4) Mode reconstruction and results evaluation. Te i i i x (t) � 􏽘 c (t) + r (t), (4) j n fnal prediction result can be obtained by summing j�1 up the predicted subcomponents and evaluated by i i the test dataset. where c (t) is the ith decompose of jth IMF and r (t) is j n the ith residual data. Te following describes the model details. Step (3): Repeat steps (1) and (2) M times, and add diferent white noise each time to obtain M groups of 3.1. Data Normalization. Te data normalization is corresponding IMFs. adopted in data processing to reduce data redundancy and Step (4): Calculate the average value of the corre- improve data usability. After normalizing, the original sponding IMFs of the M groups as the fnal IMFs. data are converted into a pure dimensionless values. Te training data are transformed into standardized data with c (t) � 􏽘 c (t), (5) j j zero mean and unit variance to better ft and prevent i�1 training divergence. Te standardized formula is as follows: where c (t) is ith decompose of jth IMF. X − X When the EEMD decomposition is completed, the (1) X′ � , original sequence data can be expressed as n IMFs and a s(X) residual. where X′ is the normalized data, X is the original data, X is the mean of the original data, and s(X) is the standard x(t) � 􏽘 c (t) + r (t), (6) j n deviation of the original data. j�1 In the prediction stage, the mean and variance param- eters are denormalized for the predicted data. where c (t), (t � 1, 2, . . . , T) is the jth IMF component decomposed at time t, r (t) is the fnal residual, and n is the 􏽢 􏽢 X � s(X)X′ + X, (2) number of IMFs. 􏽢 􏽢 Te main problem of the EEMD is the high computing where X′ is the predicted value after normalization and X is time and the residue of added noise present in the IMFs. In the fnal predicted value. the EEMD, it can be recognized that every x (t) is decomposed independently from the other realizations, and 3.2. Data Decomposition Method. Te EMD [54] is an the reconstructed signal contains residual noise and diferent adaptive method for the analysis of nonstationary and realizations of signal plus noise that may produce a diferent nonlinear signals. EMD can decompose the original signal number of modes. To overcome this limitation, the into the sum of amplitude and frequency modulation CEEMDAN algorithm was frst proposed by Torres et al. in functions, called “Intrinsic Mode Function” (IMF), and the 2011 [45]. Te main idea of the CEEMDAN is to add white fnal monotonic trend. However, EMD has the problem of noise at each phase of decomposition and calculate a unique “mode mixing,” which is very similar oscillations in diferent residue to obtain each mode. Journal of Advanced Transportation 5 Link vehicle count data Normalization Test data Train data ICEEMDAN Bayesian optimization decomposition Model initialization IMF1 IMF2 IMFn R Gaussian Process Regressor GRU GRU GRU GRU Run the model to calculate the RMSE of F1 F2 Fn Fn+1 validation set Fulfill Mode requirements? reconstruction Denormalization Output (x, y) Forecast results Evaluation results Figure 1: Framework of the link vehicle dynamic count forecasting model. Te resulting dissociation is complemented by a nu- where ⟨·⟩ is the action of averaging throughout the merically negligible error. However, there are still problems realizations. with some residual noise and “spurious” modes. Te ICE- Step 3: Compute the frst mode at the frst stage (k � 1) EMDAN technique is developed to improve the problems as d � x − R . 1 1 with some residual noise and “spurious” modes by Colo- Step 4: Estimate the second residue as the average of minas et al. [46]. local means of the realizations R + β E (w ) and de- 1 2 Given a composite signal x(t), where t is the sampling fne the second mode as follows: sequence of the signal, and let E (·) be the kth IMF obtained by EMD, and defne M(·) as the operator to calculate the d � R − R � R −⟨M􏼐R + β E 􏼐w 􏼑􏼑⟩. (9) 2 1 2 1 1 1 2 local mean of the signal, then, the ICEEMDAN algorithm is described as follows: Step 5: For k � 3, . . ., K; calculate the kth residue: Step 1: Calculate the local means of I realizations using R �⟨M􏼐R + β E 􏼐w 􏼑􏼑⟩, k k−1 k−1 k the EMD algorithm: (10) β � ε std r , k ≥ 1. i i k 0 k x � x + β E 􏼐w 􏼑, i � 1, . . . , I, (7) 0 1 Step 6: Compute the kth mode: where β � ε std(x)/E (w ) and ε is the reciprocal of 0 0 1 0 the desired signal-to-noise ratio between the frst added d � R − R . (11) k k−1 k noise and the analyzed signal. Step 7: Go back to step 4 for the next k. Step 2: Calculate the frst residue R1: Compared with EEMD and CEEMD, the ICEEMDAN R �⟨M􏼐x 􏼑⟩, (8) can not only reduce the noise in the mode but also decrease 6 Journal of Advanced Transportation the residual spurious pattern problems caused by signal Reset Gate overlap, providing an accurate reconstruction of the original signal. h h t-1 t × + 3.3. GRU Model. As a special RNN structure, LSTM solves 1- the problems of vanishing gradient and explosive gradient by changing the cell structure and adding storage cells to r u y t t t determine whether it is necessary to remember information. σ σ tanh GRU [38] improved LSTM by reducing the number of gates to decrease the training time. As shown in Figure 2, the GRU units transfer the input vector x to the output vector h t t Update Gate through time t iteration. GRUs consist of two gates: the reset gate and the update gate. Te main process in a GRU unit Figure 2: Te structure of GRU. can be described as follows: r � σ W x + W h + b 􏼁, t xr t hr t−1 r problem of the unknown objective function. Te model used u � σ W x + W h + b 􏼁, to approximate the objective function is called the surrogate t xu t hu t−1 u (12) model. Te surrogate model commonly used in Bayesian y � tanh􏼐W x + W r ⊙ h 􏼁 + b 􏼑, t xh t hy t t−1 y optimization is the Gaussian process to obtain its posterior distribution. h � 1 − u􏼁 ⊙ h + u ⊙ y , t t t−1 t t Bayesian optimization also applies an acquisition where u and r represent the update and reset gates of the function to direct sampling to an area that may improve the t t GRU, respectively, y means the candidate activation, h current best observation for searching for the next suitable t t represents the current activation, and h is the previous sampling point. Te carefully designed acquisition functions t−1 activation. W , W , W , W , W , W is the corre- balance the exploration of the search space and existing xr hr xu hu xh hy sponding weight parameter matrices; b , b , b is the cor- felds [48]. Te types of acquisition functions include PI r u y responding bias vector; and σ and tanh are the activation (Probability of Improvement) and EI (Expected Improve- functions. For more GRU information, please see the work ment) [50, 60]. of Chung et al. [59]. Te steps of Bayesian optimization of hyperparameter tuning include: 3.4. Bayesian Optimization Hyperparameters. (1) Model initialization. Prepare variables, such as the Hyperparameters are the parameters of the training algo- initial learning rate, the number of hidden units, and rithm itself, not directly learned from the training process. the L2 regularization coefcient. Each model has diferent hyperparameters, and a good (2) Use the Gaussian process to optimize the objective choice of hyperparameters can get the best performance. For function. example, there are four key hyperparameters in the GRU (3) Perform Bayesian optimization and calculate the model, such as the number of hidden units, the learning rate, RMSE of the test set. the number of GRU layers, and the number of hidden units. (4) Check the optimization results. If the results meet the However, manual tuning is inefcient and often afected by requirements, Bayesian optimization will output the human subjective factors. hyperparameters. Otherwise, the Bayesian optimi- Te basic idea of Bayesian optimization is to use Bayes’ zation will be restarted or modifed for the opti- theorem to estimate the posterior distribution of the ob- mization options to continue. jective function based on the data and then select the hyperparameter combination of the following samples according to the distribution. Te Bayesian optimization 4. Experiments and Results algorithm makes full use of the information of the previous sampling points. Te algorithm optimizes by learning the 4.1. Data Description. Te license plate recognition (LPR) shape of the objective function. It will fnd the hyper- data collected in Jinji Road and Airport City Avenue in parameters that maximize the result to the global optimal. Hangzhou, China, are used to verify the proposed model. Bayesian optimization is generally used to minimize the Te selected section of Jinji Road is about 280 m long with objective function f(x) as follows: three northbound lanes without entrance and exit in the middle. Te selected section of Airport Avenue is about x � arg min f(x), (13) 560 m long. Figure 3 shows the location of the license plate x∈χ recognition detector on the Jinji Road. Te data were col- where x is a decision variable, χ is the decision space. In lected from December 1, 2018 to December 24, 2018. Te general, the objective function f(x) is unknown, so it cannot original LPR record data contains the key information, such use gradient descent to solve f(x). Bayesian optimization as license plate of the car, the timestamp of the car passing utilizes a surrogate model to deal with the optimization the detected line, lane number, location information, and Journal of Advanced Transportation 7 Figure 3: Detectors’ location in Jinji road. Data #1 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Data #2 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Figure 4: Link dynamic vehicle count data. other information, such as car type, car color, and car length. improves the quality of input data by decomposing the Ten, the raw LPR data are employed to extract the cor- original data into more regular pattern components. Te responding LDVC data with the 5 min time window by the calculation of the decomposition quantity m is determined by m � fix(log 2(N)) − 1 [56], where N is the length of the cumulative curve model of upstream and downstream ve- hicles [10, 11, 61]. Tus, the model provides 288 LDVC data input data. We used ICEEMDAN to decompose the original points each day. We divide the data into training set and test Data #1 into 11 IMF components and a residual, as illus- set with the ratio of 90 per cent training set and 10 per cent trated in Figure 5. Te periods of these components range test set. Te obtained LDVC data of Jinji Road (Data 1) and from short to long periods and have diferent amplitudes. Airport Avenue (Data 2) are shown in Figure 4. Te Data 1 Te frst subgraph represents the original LDVC data with on the Jinji Road refects the periodic characteristics of the noise and the residual shows the trend term of the LDVC data in days and weeks. Data 1 has more obvious morning data. Te ICEEMDAN algorithm overcomes the mode peak characteristics, while Data 2 has a slightly higher mixing problem of EMD. Taking IMF7 as an example, after evening peak trafc demand. Te proposed model is applied decomposing by ICEEMDAN, the periodic characteristics of to the one-step ahead LDVC forecasting problem. Te length Data #1 in days and weeks are more obvious and more regular. of historical time window is set as 288. Te prediction horizon is set as 1, that is to say, we use one day historical data to predict 5-min LDVC. 4.3. Benchmarks and Measures of Efectiveness. Te real- world link dynamic vehicle data were divided into training 4.2. IMF Components Extraction. Te quality of input data data set and test data set. We selected 12 benchmark models, will afect the prediction performance, and ICEEMDAN including the SVR (support vector regression) [25], RF Link vehicle number Link vehicle number 8 Journal of Advanced Transportation 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -5 0 1000 2000 3000 4000 5000 6000 7000 -2 0 1000 2000 3000 4000 5000 6000 7000 -1 0 1000 2000 3000 4000 5000 6000 7000 0.5 -0.5 0 1000 2000 3000 4000 5000 6000 7000 0.2 -0.2 0 1000 2000 3000 4000 5000 6000 7000 4.15 4.1 0 1000 2000 3000 4000 5000 6000 7000 Time (5 min) Figure 5: IMF components extraction by ICEEMDAN. (random forest) [62], XGBoost (extreme gradient boosting ensemble empirical mode decomposition) [44], ICE- tree regression) [63], GRU-BO, LSTM-BO, EMD-GRU, EMDAN-GRU, and ICEEMDAN-LSTM-BO for compari- son. ICEEMDAN-GRU-BO is the proposed prediction VMD-GRU (variational mode decomposition) [64], MVMD-GRU (multivariate variational mode decomposi- method, which considers the data decomposition and tion) [65], EEMD-GRU, CEEMD-GRU (a complete Bayesian optimization parameters. We choose the baseline Residual IMF11 IMF10 IMF9 IMF8 IMF7 IMF6 IMF5 IMF4 IMF3 IMF2 IMF1 Orignal Journal of Advanced Transportation 9 Table 1: Predictive performance comparison with diferent model for the following reasons: the frst reason is to number of hidden units. compare with commonly used machine learning and deep learning models. Te second reason is to obtain the pre- HS MAE RMSE R diction performance using Bayesian optimization alone. Te Data #1 third reason compares the performance of diferent data 40 0.598 0.968 0.897 decomposition methods. LR � 0.005 80 0.588 0.959 0.899 Tis paper applied 3 evaluation indicators to evaluate the NL � 1 120 0.601 0.975 0.895 model, namely, mean absolute error (MAE), root mean L2 � 0.001 160 0.601 0.969 0.897 square error (RMSE), and coefcient of determination (R ). 200 0.598 0.969 0.897 Te calculation formulas are as follows: Data #2 􏼌 􏼌 40 1.189 1.585 0.862 1 􏼌 􏼌 (i) (i) 􏼌 􏼌 􏼌 􏼌 MAE � 􏽘 x − x 􏽢 , 􏼌 􏼌 LR � 0.005 80 1.171 1.582 0.863 i�1 NL � 1 120 1.171 1.559 0.873 􏽶�������������� � L2 � 0.001 160 1.183 1.587 0.860 n 200 1.185 1.592 0.858 (i) (i) RMSE � x − x 􏽢 , 􏽘 􏼐 􏼑 Te bold values means the best results in the data set. (14) i�1 Table 2: Predictive performance comparison with diferent initial learning rates. n (i) (i) 􏽐 􏼐x − x 􏼑 i�1 R � 1 − , LR MAE RMSE R n (i) 􏽐 􏼐x − x􏼑 i�1 Data #1 0.001 0.609 0.972 0.896 (i) where x is the observed value of link dynamic vehicle 0.003 0.601 0.969 0.897 (i) count, x 􏽢 is the predicted value of vehicle count, and x is the NL � 1 0.005 0.599 0.977 0.895 mean value of vehicle count. HS � 80 0.007 0.586 0.957 0.899 A well-performing deep learning model is inseparable L2 � 0.001 0.010 0.583 0.959 0.899 from an efective hyperparameter tuning process. In the 0.015 0.634 1.011 0.887 following section, we frst analyze the impact of model 0.1 NaN NaN NaN hyperparameters on the prediction performance. Ten, we Data #2 compare the prediction performance of the proposed hybrid 0.001 1.158 1.524 0.871 depth model and the benchmark method. 0.003 1.156 1.516 0.875 NL � 1 0.005 1.203 1.565 0.852 HS � 80 0.007 1.151 1.525 0.871 4.4. Hyperparameter Tuning with Bayesian Optimization. L2 � 0.001 0.010 1.169 1.535 0.866 0.015 1.254 1.676 0.798 Tis section verifes the superiority of the BO in tuning 0.1 NaN NaN NaN hyperparameters by the efect analysis of hyperparameters on the prediction performance. Te hyperparameters of the Te bold values means the best results in the data set. GRU model mainly include the number of hidden units (HS), the initial learning rate (LR), the regularization co- shown in Table 2, when the initial learning rate is set to 0.1, efcient (L2), and the number of GRU layers (NL) [51, 53]. the model cannot converge. In addition, the optimal initial When we compare the prediction performance of one LR for Data #1 is 0.007, and for Data #2 is 0.003, showing hyperparameter, we fx the other hyperparameters. that it is hard to manually tune an appropriate initial LR for Te number of hidden units has a strong infuence on the diferent data. Terefore, it is essential to determine a model. If the number of hidden units is too small, the net will suitable initial learning rate automatically. not learn well, while too many hidden units will afect the Te Bayesian optimization model based on the Gaussian efciency and increase the risk of overftting. Te efects of process can efectively search the candidate hyperparameter hidden layer number on the GRU model are shown in interval and determine the appropriate HS and initial LR, as Table 1. As the HS approaches the optimal value 80, the shown in Figure 6(a). Moreover, there exist interactions prediction accuracy of the model gradually increases. When between diferent hyperparameters. Te superiority of the HS continues to increase, the RMSE and R gradually Bayesian optimization is optimizing multiple hyper- parameters at the same time, as illustrated in Figure 6(b). Te decrease, refecting the possibility of overftting. Te same rule also fts in the Data #2. Furthermore, we could conclude Bayesian optimization can simultaneously optimize the that the optimal number of hidden units for diferent data is initial LR and HS and select the appropriate combination. diferent. In terms of the L2 regularization coefcient, it helps to Ten, we tested the efects of diferent initial learning improve the overftting issues and the generalization level of rates on the GRU model. Te initial learning rates have the model. However, a too large regularization coefcient signifcant efects on the model: a too high learning rate may may lead to the underftting of the model. We manually cause the model to fail to converge; a too small learning rate adjust the L2 regularization coefcient to examine the will cause the model to converge slowly or fail to learn. As prediction performance of the model, as shown in Table 3. Inital Learning Rate 10 Journal of Advanced Transportation 1.35 0.7 1.3 0.65 1.25 0.6 0.55 1.2 0.5 1.15 0.45 1.1 0.4 1.05 0.35 -1 200 -2 -3 0.95 0.9 -3 -2 -1 10 10 10 Observed points Next point Inital Learning Rate Model mean Model minimum feasible Observed points Noise error bars Model mean Next point Model error bars Model minimum feasible (a) (b) Figure 6: Adjusting model hyperparameters with Bayesian optimization. (a) Bayesian optimization adjusts the initial learning rate. (b) Bayesian optimization adjusts the initial learning rate and hidden unit size. Table 3: Infuence of diferent L2 regularization coefcients on the Te results in Table 3 show that diferent regularization model. coefcients impact the model prediction accuracy. Data #1 corresponds to the optimal regularization coefcient equal L2 MAE RMSE R to 0.007 and Data #2 corresponds to the optimal L2 reg- Data #1 ularization coefcient equal to 0.005, showing that the 0.001 0.614 0.989 0.892 model of diferent data suits its own optimal L2 regulari- LR � 0.005 0.003 0.593 0.962 0.898 NL � 1 zation coefcients. 0.005 0.599 0.980 0.894 HS � 80 In Table 4, we look into the impact of multilayer GRU on 0.007 0.582 0.955 0.900 the prediction performance of the model. In a multilayer Data #2 GRU, the number of hidden units in each layer is equally 0.001 1.180 1.599 0.840 distributed. Second, a dropout layer with a dropout prob- LR � 0.00 0.003 1.174 1.585 0.846 NL � 1 ability equal to 0.2 is added after each GRU layer to avoid 0.005 1.173 1.580 0.848 HS � 80 overftting. 0.007 1.183 1.606 0.837 Te results also show RMSE increases as the number of GRU layers increases. Te optimal number of GRU layers corresponding to Data #1 and Data #2 is one layer. It shows that as the number of layers increases, the model may Table 4: Infuence of diferent GRU layers on the model. overft. NL MAE RMSE R In Table 5, we use the Bayesian optimization algorithm to Data #1 optimize the four hyperparameters of the model. Compared with the manually tuning method, Bayesian optimization 1 0.653 0.931 0.905 LR � 0.005 could obtain the best-ftted hyperparameters when all 2 0.657 0.941 0.902 HS � 80/NL 3 0.675 0.946 0.901 evaluation indicator results are optimal. L2 � 0.001 4 0.677 0.950 0.901 Tis section shows the hyperparameters have a signif- Data #2 cant impact on the performance of the GRU model. Te setting of various hyperparameters also presents the phe- 1 1.122 1.603 0.833 LR � 0.005 2 1.129 1.600 0.830 nomenon of trade-ofs. Manual tuning is not only time- HS � 80/NL 3 1.125 1.601 0.831 consuming and labor-intensive but also arduous to obtain L2 � 0.001 4 1.133 1.601 0.831 better results. Bayesian optimization could achieve the best Te bold values means the best results in the data set. performance in tuning hyperparameters. Number of Hidden Units RMSE RMSE Journal of Advanced Transportation 11 Table 5: Efects of Bayesian optimization super parameters on the model. MAE RMSE R Data #1 LR � 6.281e − 04; S � 173 0.580 0.835 0.923 L2 � 1.548e − 05; NL � 1 Data #2 LR � 0.0054; HS � 184 1.089 1.297 0.882 L2 � 0.046; NL � 1 Table 6: Performance comparison of diferent prediction models. Data #1 Data #2 Training time Prediction time Training time Prediction time 2 2 MAE RMSE R MAE RMSE R (min) (s) (min) (s) SVR 0.724 1.048 0.879 3.4 0.02 1.182 1.683 0.842 2.7 0.02 RF 0.683 1.028 0.884 0.5 0.06 1.158 1.661 0.846 0.2 0.02 XGBoost 0.684 1.030 0.883 0.2 0.02 1.158 1.660 0.846 0.1 0.01 GRU-BO 0.586 0.835 0.923 12.3 3.94 1.089 1.297 0.882 11.3 3.47 LSTM-BO 0.528 0.749 0.938 19.2 4.67 1.022 1.390 0.892 17.1 4.32 EMD-GRU 0.537 0.705 0.945 11.4 5.17 0.524 0.712 0.906 9.0 5.80 VMD-GRU 0.383 0.536 0.968 4.8 3.26 0.407 0.550 0.944 4.8 2.85 MVMD-GRU 0.601 0.833 0.924 4.0 2.74 0.531 0.721 0.904 4.4 3.19 EEMD-GRU 0.335 0.457 0.977 9.6 5.48 0.374 0.505 0.953 8.9 5.69 CEEMD-GRU 0.322 0.453 0.977 9.7 5.68 0.363 0.510 0.952 9.1 5.61 ICEEMDAN-GRU 0.257 0.352 0.986 10.4 6.74 0.289 0.420 0.967 10.8 5.86 ICEEMDAN- 0.217 0.307 0.990 221.4 7.47 0.223 0.311 0.982 201.2 6.14 LSTM-BO ICEEMDAN-GRU- 0.203 0.298 0.990 146.0 6.13 0.233 0.324 0.981 136.6 6.43 BO 4.5. Benchmarks and Model Comparisons. Tis part studies Table 6 shows the performance comparison of diferent the efect of data decomposition and Bayesian optimization models and Figure 7 shows the prediction error box plot. Te on model performance. We compared the performance of 12 results of Data #1 show that the performance of the ICE- benchmark models with the proposed ICEEMDAN-GRU- EMDAN-GRU-BO model is the best. Te evaluation indi- BO model. We used a computer with a 2.9 Ghz, dual-core cators of the proposed model are similar to the ICEEMDAN- processor and 8G memory in our experiments. LSTM-BO model in Data #2. However, the training time of Te parameters setting of the models are set as follows. the proposed model is drastically reduced. XGBoost and RF show similar predictive performance, Te parameters of SVR, RF, and XGBoost are optimized by the grid search method. According to the results of grid and both perform better than SVR. By comparing with search, the kernel coefcient, regularization parameter, and XGBoost and RF models, the GRU-BO and the LSTM-BO epsilon are three essential parameters in the SVR model, models improves prediction accuracy. By comparing with which are set as 288.83, 29.31, and 0.0017 in Data #1 and GRU-BO, EMD-GRU achieves better prediction accuracy 60.2, 775.97, and 0.0061 in Data #2, respectively. Te number due to the efect of EMD, which shows that data decom- of trees and max depth are two essential parameters in the position obtains a better outcome. RF model, which are set as 138 and 15 in Data #1 and 22, 9 in Te efect of diferent data decomposition on model Data #2, respectively. Te number of estimators, learning performance shows that, compared with EMD, VMD, rate, and max depth are three key parameters in XGBoost, MVDM, EEMD, and CEEMD, the ICEEMDAN data de- which are set as 117 and 0.3, 6 in Data #1 and 102 and 0.4, 5 composition has the most signifcant improvement in model prediction. in Data #2, respectively. Te hyperparameters of the GRU model such as the number of hidden units, the initial Although Bayesian optimization increases the training learning rate, the regularization coefcient, and the number time of ICEEMDAN-GRU-BO, the training time is within of GRU layers are tuning with Bayesian optimization. an acceptable range for the improvement of prediction According to the results of section 4.4, we set the initial performance. learning rate to 0.005, the number of hidden units to 80, the Bayesian optimization efectively and simultaneously L2 regularization coefcient to 0.001, and the number of optimizes the four hyperparameters of the model; GRU layers to one. avoiding manual tuning that only relies on the empirical 12 Journal of Advanced Transportation -5 -5 Figure 7: Box plots of prediction errors of diferent models. methods. In addition, with the development of com- and BO all contribute to the improvement of prediction puting technology, such as the technology represented by accuracy and efciency. Combining ICEEMDAN-GRU- cloud computing, higher-performance computing re- BO can obtain the best performance and least calculation sources will become cheaper and more convenient, and complexity. the training time of the model will not be a limiting Te prediction model is based on the ICEEMDAN-GRU- factor. BO for link dynamic vehicle count. It is necessary to establish diferent prediction models for diferent road section sce- narios. In the future, we will further explore whether we can 5. Conclusions and Discussion make full use of the training information of the previous In this paper, we propose a hybrid deep learning model that models and train the deep learning prediction model by combines the data decomposition method, ICEEMDAN, transfer learning to improve the modeling efciency and and GRU model with Bayesian optimization for link dy- prediction accuracy. Besides, LDVC is a crucial parameter in namic vehicle count forecasting. ICEEMDAN is used to developing efcient adaptive trafc signal controllers as process and decompose the original data into specifc IMFs. trafc-responsive control systems require reliable real-time Considering the nonlinear characteristics of IMFs of link demand information on prevailing trafc conditions to make dynamic vehicle data, GRU is used as the basic forecast sensible control decisions. Te proposed model can be used to model of each IMF. BO is adapted to auto-tune hyper- predict the basic inputs for real-time control applications in parameters of the deep learning approach. Te presented urban areas and further extended to network-wide vehicle model is compared with 12 benchmark models including counts forecasting for network-wide trafc signal control SVR, RF, XGBoost, GRU-BO, LSTM-BO, EMD-GRU, optimization in urban trafc management. VMD-GRU, MVMD-GRU, EEMD-GRU, CEEMD-GRU, ICEEMDAN-GRU, and ICEEMD-LSTM-BO. Models are Data Availability validated with the real-world data collected in Hangzhou, China. Tree evaluation indicators (MAE, RMSE, and R ), Te link vehicle count data used to support the fndings of training time, and prediction time are used to measure the this study are available from the corresponding author upon performance. Results indicated that ICEEMDAN, GRU, request. Error Error SVR SVR RF RF XGBoost XGBoost GRU-BO GRU-BO LSTM-BO LSTM-BO EMD-GRU EMD-GRU VMD-GRU VMD-GRU MVMD-GRU MVMD-GRU EEMD-GRU EEMD-GRU CEEMD-GRU CEEMD-GRU ICEEMDAN-GRU ICEEMDAN-GRU ICEEMDAN-LSTM-BO ICEEMDAN-LSTM-BO ICEEMDAN-GRU-BO ICEEMDAN-GRU-BO Journal of Advanced Transportation 13 [12] M. Van Der Voort, M. Dougherty, and S. Watson, “Com- Conflicts of Interest bining kohonen maps with arima time series models to Te authors declare that there are no conficts of interest forecast trafc fow,” Transportation Research Part C: regarding the publication of this article. Emerging Technologies, vol. 4, no. 5, pp. 307–318, 1996. [13] B. M. Williams, P. K. Durvasula, and D. E. Brown, “Urban freeway trafc fow prediction: application of seasonal Acknowledgments autoregressive integrated moving average and exponential smoothing models,” Transportation Research Record, Tis research study was fnancially supported by National vol. 1644, no. 1, pp. 132–141, 1998. Natural Science Foundation of China under Grant nos. [14] K. Kumar, M. Parida, and V. K. Katiyar, “Short term trafc 52131202, 71901193, and 52072340; National Key Research fow prediction in heterogeneous condition using artifcial and Development Program of China under Grant no. neural network,” Transport, vol. 30, no. 4, pp. 397–405, 2013. 2019YFB1600303; and China Postdoctoral Science Foun- [15] H. Zheng, F. Lin, X. Feng, and Y. Chen, “A hybrid deep dation under Grant no. 2020M671724. learning model with attention-based conv-LSTM networks for short-term trafc fow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 11, pp. 6910– References 6920, 2021. [16] J. Wang and Q. Shi, “Short-term trafc speed forecasting [1] C. Diakaki, M. Papageorgiou, and K. Aboudolas, “A multi- hybrid model based on Chaos–Wavelet Analysis-Support variable regulator approach to trafc-responsive network- Vector Machine theory,” Transportation Research Part C: wide signal control,” Control Engineering Practice, vol. 10, no. 2, pp. 183–195, 2002. Emerging Technologies, vol. 27, pp. 219–232, 2013. [17] L. Zhou, S. Zhang, J. Yu, and X. Chen, “Spatial–temporal deep [2] K. M. A. E. Aboudolas, M. Papageorgiou, A. Kouvelas, and E. Kosmatopoulos, “A rolling-horizon quadratic-program- tensor neural networks for large-scale urban network speed ming approach to the signal control problem in large-scale prediction,” IEEE Transactions on Intelligent Transportation congested urban road networks,” Transportation Research Systems, vol. 21, no. 9, pp. 3718–3729, 2020. Part C: Emerging Technologies, vol. 18, no. 5, pp. 680–694, [18] Y. Chen, C. Tao, Q. Bai, F. Liu, X. Qi, and R. Zhuo, “Short- term speed prediction for expressway considering adaptive [3] S. Chen and D. J. Sun, “An improved adaptive signal control selection of spatiotemporal dimensions and similar trafc method for isolated signalized intersection based on dynamic features,” Journal of Transportation Engineering, Part A: programming,” IEEE Intelligent Transportation Systems Systems, vol. 146, no. 10, Article ID 04020114, 2020. Magazine, vol. 8, no. 4, pp. 4–14, 2016. [19] J. W. C. Van Lint, S. P. Hooqendoorn, and H. J. Van Zuvlen, [4] G. Vigos, M. Papageorgiou, and Y. Wang, “Real-time esti- “Freeway travel time prediction with state-space neural net- mation of vehicle-count within signalized links,” Trans- works modeling state-space dynamics with recurrent neural portation Research Part C: Emerging Technologies, vol. 16, networks,” Transportation Research Record, vol. 1811, no. 1, pp. 18–35, 2008. pp. 30–39, 2002. [5] K. Kwong, R. Kavaler, R. Rajagopal, and P. Varaiya, “Real- [20] C.-H. Wu, J. M. Ho, and D. T. Lee, “Travel-time prediction time measurement of link vehicle count and travel time in a with support vector regression,” IEEE Transactions on In- road network,” IEEE Transactions on Intelligent Trans- telligent Transportation Systems, vol. 5, no. 4, pp. 276–281, portation Systems, vol. 11, no. 4, pp. 814–825, 2010. [6] G. Vigos and M. Papageorgiou, “A simplifed estimation [21] J. van Lint, S. Hoogendoorn, and H. van Zuylen, “Accurate scheme for the number of vehicles in signalized links,” IEEE freeway travel time prediction with state-space neural net- Transactions on Intelligent Transportation Systems, vol. 11, works under missing data,” Transportation Research Part C: no. 2, pp. 312–321, 2010. Emerging Technologies, vol. 13, no. 5-6, pp. 347–369, 2005. [7] M. Papageorgiou and G. Vigos, “Relating time-occupancy [22] S. V. Kumar and L. Vanajakshi, “Short-term trafc fow measurements to space-occupancy and link vehicle-count,” prediction using seasonal ARIMA model with limited input Transportation Research Part C: Emerging Technologies, data,” European Transport Research Review, vol. 7, no. 3, p. 21, vol. 16, no. 1, pp. 1–17, 2008. [8] M. Rostami Shahrbabaki, A. A. Safavi, M. Papageorgiou, and [23] G. A. Davis and N. L. Nihan, “Nonparametric regression and I. Papamichail, “A data fusion approach for real-time trafc short-term freeway trafc forecasting,” Journal of Trans- state estimation in urban signalized links,” Transportation portation Engineering, vol. 117, no. 2, pp. 178–188, 1991. Research Part C: Emerging Technologies, vol. 92, pp. 525–548, [24] B. L. Smith and M. J. Demetsky, “Short-term trafc fow prediction models-a comparison of neural network and [9] S. Lin, B. De Schutter, Y. Xi, and H. Hellendoorn, “Efcient nonparametric regression approaches,” in Proceedings of the network-wide model-based predictive control for urban IEEE international conference on systems, man and trafc networks,” Transportation Research Part C: Emerging cybernetics, San Antonio, TX, USA, October 1994. Technologies, vol. 24, pp. 122–140, 2012. [25] H. Drucker, C. J. C. Burges, L. Kaufman, S. Alex, and [10] X. Zhan, R. Li, and S. V. Ukkusuri, “Lane-based real-time V. Vapnik, “Support vector regression machines,” Advances queue length estimation using license plate recognition data,” in Neural Information Processing Systems, vol. 9, pp. 155–161, Transportation Research Part C: Emerging Technologies, vol. 57, pp. 85–102, 2015. [26] P. Cai, Y. Wang, G. Lu, P. Chen, C. Ding, and J. Sun, “A [11] C. He, D. Wang, M. Chen, G. Qian, and Z. Cai, “Link dynamic vehicle count estimation based on travel time distribution spatiotemporal correlative k-nearest neighbor model for short-term trafc multistep forecasting,” Transportation Re- using license plate recognition data,” Transportmetrica: Transportation Science, pp. 1–22, 2021. search Part C: Emerging Technologies, vol. 62, pp. 21–34, 2016. 14 Journal of Advanced Transportation [27] X. Feng, X. Ling, H. Zheng, Z. Chen, and Y. Xu, “Adaptive [42] H. F. Yang and Y. P. P. Chen, “Hybrid deep learning and multi-kernel SVM with spatial–temporal correlation for empirical mode decomposition model for time series appli- short-term trafc fow prediction,” IEEE Transactions on cations,” Expert Systems with Applications, vol. 120, Intelligent Transportation Systems, vol. 20, no. 6, pp. 2001– pp. 128–138, 2019. 2013, 2019. [43] X. Jiang, L. Zhang, and X. Michael Chen, “Short-term fore- [28] Y. Hua, Z. Zhao, R. Li, X. Chen, Z. Liu, and H. Zhang, “Deep casting of high-speed rail demand: a hybrid approach com- learning with long short-term memory for time series pre- bining ensemble empirical mode decomposition and gray diction,” IEEE Communications Magazine, vol. 57, no. 6, support vector machine with real-world applications in pp. 114–119, 2019. China,” Transportation Research Part C: Emerging Technol- [29] Y. S. Jeong, Y. J. Byon, M. M. Castro-Neto, and S. M. Easa, ogies, vol. 44, pp. 110–127, 2014. “Supervised weighting-online learning algorithm for short- [44] J.-R. Yeh, J. S. Shieh, and N. E. Huang, “Complementary term trafc fow prediction,” IEEE Transactions on Intelligent ensemble empirical mode decomposition: a novel noise en- Transportation Systems, vol. 14, no. 4, pp. 1700–1707, 2013. hanced data analysis method,” Advances in Adaptive Data [30] W.-C. Hong, Y. Dong, F. Zheng, and S. Y. Wei, “Hybrid Analysis, vol. 02, no. 02, pp. 135–156, 2010. evolutionary algorithms in a SVR trafc fow forecasting [45] M. E. Torres, M. A. Colominas, G. Schlotthauer, and model,” Applied Mathematics and Computation, vol. 217, P. Flandrin, “A complete ensemble empirical mode decom- no. 15, pp. 6733–6747, 2011. position with adaptive noise,” in Proceedings of the 2011 IEEE [31] G. Leshem and Y. Ritov, “Trafc fow prediction using ada- international conference on acoustics, speech and signal pro- boost algorithm with random forests as a weak learner,” cessing (ICASSP), Prague, Czech Republic, May 2011. International Journal of Mathematical and Computational [46] M. A. Colominas, G. Schlotthauer, and M. E. Torres, “Im- Sciences, vol. 1, no. 1, pp. 1–6, 2007. proved complete ensemble EMD: a suitable tool for bio- [32] H. Chen, S. Grant-Muller, L. Mussone, and F. Montgomery, medical signal processing,” Biomedical Signal Processing and “A study of hybrid neural network approaches and the efects Control, vol. 14, pp. 19–29, 2014. of missing data on trafc forecasting,” Neural Computing & [47] F. Hutter, J. Lucke, ¨ and L. Schmidt-Tieme, “Beyond manual Applications, vol. 10, no. 3, pp. 277–286, 2001. tuning of hyperparameters,” KI-Kunstliche ¨ Intelligenz, vol. 29, [33] Yu Wei and Mu-C. Chen, “Forecasting the short-term metro no. 4, pp. 329–337, 2015. passenger fow with empirical mode decomposition and [48] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and neural networks,” Transportation Research Part C: Emerging N. de Freitas, “Taking the human out of the loop: a review of Technologies, vol. 21, no. 1, pp. 148–162, 2012. bayesian optimization,” Proceedings of the IEEE, vol. 104, [34] J. J. Ruiz-Aguilar, I. J. Turias, and M. J. Jimenez-Come, ´ no. 1, pp. 148–175, 2016. “Hybrid approaches based on SARIMA and artifcial neural [49] J. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms networks for inspection time series forecasting,” Trans- for hyper-parameter optimization,” in Proceedings of the 25th portation Research Part E: Logistics and Transportation Re- annual conference on neural information processing systems view, vol. 67, pp. 1–13, 2014. (NIPS 2011), Granada, Spain, January 2011. [35] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short- [50] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian term memory neural network for trafc speed prediction optimization of machine learning algorithms,” Advances in using remote microwave sensor data,” Transportation Re- Neural Information Processing Systems, vol. 25, 2012. search Part C: Emerging Technologies, vol. 54, pp. 187–197, [51] H. Cheng, X. Ding, W. Zhou, and R. Ding, “A hybrid elec- 2015. tricity price forecasting model with Bayesian optimization for [36] Z. Zhao, W. H. Chen, X. M. Wu, P. C. Y. Chen, and J. M. Liu, German energy exchange,” International Journal of Electrical “LSTM network: a deep learning approach for short-term Power & Energy Systems, vol. 110, pp. 653–666, 2019. trafc forecast,” IET Intelligent Transport Systems, vol. 11, [52] F. He, J. Zhou, Z.-K. Feng, G. Liu, and Y. Yang, “A hybrid short-term load forecasting model based on variational mode no. 2, pp. 68–75, 2017. [37] D. Yang, K. Chen, M. Yang, and X. Zhao, “Urban rail transit decomposition and long short-term memory networks con- passenger fow forecast based on LSTM with enhanced long- sidering relevant factors with Bayesian optimization algo- term features,” IET Intelligent Transport Systems, vol. 13, rithm,” Applied Energy, vol. 237, pp. 103–116, 2019. [53] H. Yi and K. H. N. Bui, “An automated hyperparameter no. 10, pp. 1475–1482, 2019. [38] K. Cho, B. Van Merrienboer, ¨ C. Gulcehre et al., “Learning search-based deep learning model for highway trafc pre- phrase representations using RNN encoder-decoder for sta- diction,” IEEE Transactions on Intelligent Transportation tistical machine translation,” 2014, https://arxiv.org/abs/1406. Systems, vol. 22, no. 9, pp. 5486–5495, 2021. [54] N. E. Huang, Z. Shen, S. R. Long et al., “Te empirical mode [39] K. Zhang, L. Zheng, Z. Liu, and N. Jia, “A deep learning based decomposition and the Hilbert spectrum for nonlinear and multitask model for network-wide trafc speed prediction,” non-stationary time series analysis,” Proceedings of the Royal Neurocomputing, vol. 396, pp. 438–450, 2020. Society of London. Series A: Mathematical, Physical and En- [40] S. Zhang, L. Zhou, X. M. Chen, L. Zhang, L. Li, and M. Li, gineering Sciences, vol. 454, no. 1971, pp. 903–995, 1998. “Network-wide trafc speed forecasting: 3D convolutional [55] N. E. Huang, M. L. C. Wu, S. R. Long et al., “A confdence neural network with ensemble empirical mode decomposi- limit for the empirical mode decomposition and hilbert tion,” Computer-Aided Civil and Infrastructure Engineering, spectral analysis,” Proceedings of the Royal Society of London. vol. 35, no. 10, pp. 1132–1147, 2020. Series A: Mathematical, Physical and Engineering Sciences, [41] L. Li, X. Qu, J. Zhang, H. Li, and B. Ran, “Travel time pre- vol. 459, pp. 2317–2345, 2003. diction for highway network based on the ensemble empirical [56] Z. Wu and N. E. Huang, “Ensemble empirical mode de- mode decomposition and random vector functional link composition: a noise-assisted data analysis method,” Ad- network,” Applied Soft Computing, vol. 73, pp. 921–932, 2018. vances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1–41, 2009. Journal of Advanced Transportation 15 [57] Y. X. Wu, Q.-B. Wu, and J.-Q. Zhu, “Improved EEMD-based crude oil price forecasting using LSTM networks,” Physica A: Statistical Mechanics and Its Applications, vol. 516, pp. 114– 124, 2019. [58] Y. Shrivastava and B. Singh, “A comparative study of EMD and EEMD approaches for identifying chatter frequency in CNC turning,” European Journal of Mechanics - A: Solids, vol. 73, pp. 381–393, 2019. [59] J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, https://arxiv.org/abs/1412.3555. [60] M. Pelikan, D. E. Goldberg, and E. Cantu-Paz, ´ “BOA: the Bayesian optimization algorithm,” in Proceedings of the ge- netic and evolutionary computation conference GECCO-99, Orlando, Florida, USA, July 1999. [61] A. Bhaskar, T. Tsubota, L. M. Kieu, and E. Chung, “Urban trafc state estimation: fusing point and zone based data,” Transportation Research Part C: Emerging Technologies, vol. 48, pp. 120–142, 2014. [62] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. [63] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 2016. [64] K. Dragomiretskiy and D. Zosso, “Variational mode de- composition,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531–544, 2014. [65] N. U. Rehman and H. Aftab, “Multivariate variational mode decomposition,” IEEE Transactions on Signal Processing, vol. 67, no. 23, pp. 6039–6052, 2019.

Journal

Journal of Advanced TransportationHindawi Publishing Corporation

Published: Feb 7, 2023

There are no references for this article.