A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization
A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization
He, Chunguang;Wang, Dianhai;Yu, Yi;Cai, Zhengyi
2023-02-07 00:00:00
Hindawi Journal of Advanced Transportation Volume 2023, Article ID 5070504, 15 pages https://doi.org/10.1155/2023/5070504 Research Article A Hybrid Deep Learning Model for Link Dynamic Vehicle Count Forecasting with Bayesian Optimization 1,2 1 1,3 1 Chunguang He , Dianhai Wang , Yi Yu , and Zhengyi Cai College of Civil Engineering and Architecture, Zhejiang University, Hangzhou, China School of Transportation and Logistics Engineering, Xinjiang Agricultural University, Urumqi, China Shanghai AI Laboratory, Shanghai 200232, China Correspondence should be addressed to Zhengyi Cai; caizhengyi@zju.edu.cn Received 30 March 2022; Revised 7 September 2022; Accepted 24 November 2022; Published 7 February 2023 Academic Editor: Young-Jae Lee Copyright © 2023 Chunguang He et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Te link dynamic vehicle count is a spatial variable that measures the trafc state of road sections, which refects the actual trafc demand. Tis paper presents a hybrid deep learning method that combines the gated recurrent unit (GRU) neural network model with automatic hyperparameter tuning based on Bayesian optimization (BO) and the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) model. Tere are four steps in this hybrid approach. First, the ICE- EMDAN is employed to decompose the link dynamic vehicle count time series data into several intrinsic components. Second, the components are predicted by the GRU model. At the same time, the Bayesian optimization method is utilized to automatically optimize the hyperparameters of the GRU model. Finally, the predicted subcomponents are reconstructed to obtain the fnal prediction results. Te proposed hybrid deep learning method is tested on two roads of Hangzhou, China. Results show that, compared with the 12 benchmark models, the proposed hybrid deep learning model achieves the best performance in link dynamic vehicle count forecasting. on a specifc road that describes the space occupancy rate of 1. Introduction roads [4–6] and can refect trafc demand more precisely With the development of the social economy and urbani- [7, 8]. Accurate and real-time prediction of the link dynamic zation, travellers’ trafc demands increased rapidly. Trafc vehicle count can also provide a reliable basis for the net- problems such as trafc congestion, environmental pollu- work-wide trafc signal control strategy and optimization tion, and economic losses have brought challenges to urban [1, 2, 9]. In recent years, thanks to continuous investments in transportation management. Intelligent transportation intelligent transportation systems, various sensors have been technology promises to deal with these problems, and ac- deployed and a large amount of real-time trafc data can be curate and efective forecasting of trafc demands is a key collected. Te link dynamic vehicle count data are efectively calculated under the new data environment with mature step. Traditionally, indicators such as trafc fow are used to represent trafc demands. While in seriously congested technology [10, 11]. However, the critical issues and chal- areas, those indicators can hardly refect actual trafc de- lenges unaddressed in forecasting link dynamic vehicle mands [1]. Trafc signal control systems, such as SCOOT count in the following aspects: (a) LDVC is often disturbed (Split Cycle Ofset Optimization Technique) and SCATS by stochastic factors. For instance, LDVC experiences a (Sydney Coordinated Adaptive Trafc System), failed to sudden increase and decrease when trafc fow becomes work properly due to the inaccurate trafc demand esti- congested or trafc incidents occur, such as accidents and mation in the congested period [2, 3]. temporary trafc control measures. (b) Although selecting Compared with the indicator of trafc fow, the link the “best” model among a set of baselines is signifcant, a dynamic vehicle count (LDVC) refers to the vehicle number better alternative is to consider the strength and robustness 2 Journal of Advanced Transportation consuming and laborious in hyperparameter tuning, the of the prediction results. By decomposing trafc data into subsequences, it can help capture both the common ten- hyperparameters in our model are automatically and ef- ciently tuned using BO. (4) Te test results show that the dency and some changes in trafc fow to improve the prediction accuracy. (c) Most machine learning-based proposed hybrid deep learning framework achieved the best methods can mine the nonlinear profle of link vehicle count, performance in the aspect of improvement in prediction but overftting issues may occur. It indicates that the model performance and reduction in training time compared with extracts noise in the training data as a feature of the data a variety of benchmark models. Considering the LDVC is itself, which in turn degrades performance in the test dataset. one of the best inputs for real-time control applications in Machine learning approaches are widely used in trafc urban areas, the proposed model could provide the accurate real-time LDVC forecast data for real-time trafc control. forecasting. Researchers obtained some progress in pre- diction algorithms, model fusion, and temporal and spatial Te rest of the paper is organized as follows: Section 2 reviews the relevant literature about trafc prediction model, characteristics of trafc data. However, there are still some crucial issues and challenges. (a) In trafc research feld, hybrid model, and hyperparameter optimization. Section 3 proposes the framework of the prediction model, including most forecasting models focus on trafc fow [12–15], trafc speed [16–18], or travel time [19–21]. Few forecasting the logical relationship between data decomposition, model models focus on the prediction of the LDVC, let alone training, parameter optimization, and prediction compo- considering the data characteristics of the LDVC in the nent reconstruction. Te fundamental methods are prediction model. (b) Te LDVC has nonlinear and sto- explained, including the sequence data decomposition chastic characteristics, such as the random change due to method ICEEMDAN, the basic model GRU, and the trafc congestion and weather factors. Simultaneously, due hyperparameter tuning algorithm BO. Section 4 validates the proposed model using actual link dynamic vehicle count to the signal light control, weekday commuting, the LDVC shows the characteristics of long short-term periodic data collected in Hangzhou, China. We further study the infuence of hyperparameter tuning and data decomposition changes in signal cycle, day, and week pattern. (c) Te excellent performance of deep learning methods in trafc on model performance. Besides, we compare the prediction accuracy and computing time of our model with 12 forecasting is inseparable from the model hyperparameter tuning efciently and appropriately. Te prediction model benchmark models. Section 5 concludes the paper. represented by deep learning requires many hyperparameter tuning, which is time consuming and laborious but hard to 2. Literature Review obtain suitable hyperparameters. In response to these challenges, this paper proposes a In the past few decades, there has been a lot of research on hybrid ICEEMDAN-GRU-BO forecasting model for the link trafc system forecasting. Tis paper focus on the trafc dynamic vehicle count. Te hybrid model fuses the ICE- prediction model, hybrid model, and hyperparameter op- EMDAN based on hyperparameter tuning with Bayesian timization. Te literature review of the three research lines is optimization (BO) and the improved complete ensemble summarized as follows. empirical mode decomposition with adaptive noise (ICE- Tis section frst reviews the literature on trafc infor- EMDAN). First, the ICEEMDAN is utilized to decompose mation prediction models, such as parametric models the link dynamic vehicle count data into subcomponents. [12, 13, 22] and nonparametric models [20, 23–28]. Tus, the decomposed components reduce the stochastic A parametric model mainly considers some unsteady characteristics and become more regular and suitable for time series data to establish prediction models with limited prediction. Second, the GRU models are applied to predict parameters. Autoregressive integrated moving average those components by considering the feature of long short- (ARIMA) or seasonal autoregressive integrated moving term periods. Simultaneously, the BO is used to automati- average (SARIMA) is commonly used in time series data cally optimize the hyperparameters of the GRU models to forecasting in transportation. For example, Williams et al. deal with the challenge problem of the hyperparameter developed a SARIMA model to identify seasonal patterns to tuning. Finally, the predicted subcomponents are recon- capture periodic changes in trafc states [13]. Van Der Voort structed to obtain the fnal prediction results. et al. used a self-organizing neural network graph as the Te contributions of this paper can be summarized as initial classifer associated with an individually ARIMA follows: (1) For the frst time, a hybrid ensemble decom- model to predict the half hour trafc fow on French position deep learning prediction framework is proposed to highways [12]. Kumar et al. selected a three-lane arterial predict the link dynamic vehicle count to improve the road with limited three days of trafc fow data in Chennai, prediction accuracy and reduce the prediction time. (2) India, to establish a SARIMA model for trafc fow fore- Aiming at tackling the challenges mentioned above, we casting [14]. present a novel approach by integrating the GRU with Te nonparametric models mainly include machine ICEEMDAN. Te data decomposition method ICEEMDAN learning and deep learning approaches. Machine learning is used to reveal the nonlinearity and stochastic character- has been widely used to predict trafc information. For istic of the link dynamic vehicle count. We propose a short- example, researchers used the k-nearest neighbor (KNN) term prediction method based on GRU model to efectively model to predict short-term trafc fow [23, 26, 29]. Support capture the long short-term period features. (3) To avoid vector regression (SVR) was utilized in trafc fow predic- overftting issues and address the considerable time tion [27, 29, 30] and travel time forecasting [20]. Random Journal of Advanced Transportation 3 and Chen combined the EMD and back propagation neural forest regression (RF) was applied for trafc fow prediction [31]. Early neural network modeling such as multilayer network (BPN) model to predict short-term passenger fow in the subway system [33]. Yang and Chen combined EMD perceptron (MLP) [24, 32], back propagation neural net- work (BPN) [33], and artifcial neutral networks (ANN) and stacked autoencoder (SAE) for passenger fow predic- were widely employed in the prediction of trafc systems. tion in urban rail transit [42]. Jiang et al. combined the For example, Kumar et al. operated ANN for the short-term EEMD and grey support vector machine model to develop a prediction of trafc volume [14]. Ruiz Aguilar et al. proposed hybrid short-term demand forecasting method for short- a hybrid prediction method based on the combination of the term high-speed rail passenger fow forecasting [43]. Te ARIMA and ANN models to predict the number of goods particle swarm optimization algorithm was used to optimize the grey support vector machine, and the results show that inspected at European border checkpoints [34]. However, the traditional artifcial neural network cannot the model performs well in terms of prediction accuracy. Zhang et al. proposed a hybrid deep learning prediction capture time-series data features because it does not consider time dependence. To overcome this shortcoming, re- model that combined 3D convolutional neural network (3D CNN) and EEMD to predict the network-wide speed of searchers have explored a large number of novel neural network models. Te deep learning models are the fastest Beijing, and the results showed that the EEMD method growing algorithms in recent years. In terms of sequence efectively improves the input data, and 3D CNN can data modeling, RNN (Recurrent Neural Network) is one of consider the temporal and spatial characteristics of the road the representatives. Van Lint et al. proposed a nonlinear network [40]. As advantages in data decomposition, other state space method using RNN to predict short-term improved algorithms were proposed based on EMD and highway travel time [19]. EEMD, such as complementary EEMD (CEEMD) [44], a complete EEMD with adaptive noise (CEEMDAN) [45], and RNN’s variant LSTM (Long-Short Term Memory) solves the shortcoming that RNN cannot store long-term memory of ICEEMDAN [46]. Te excellent performance of the neural network pre- information. LSTM has successful applications in trafc system prediction. For example, Ma et al. wielded remote microwave diction model is inseparable from the parameter optimi- zation. Te following literature reviews focus on the relevant sensor data to establish LSTM models to predict trafc speed [35]. Zhao et al. proposed an LSTM prediction model for short- parameter optimization methods. Suitable parameter setting term trafc fow prediction [36]. Yang et al. applied LSTM to shows an enormous impact on the performance of the neural predict urban rail transit passenger fow [37]. network prediction model [43]. For parameter optimization As an improved algorithm of LSTM, GRU [38] was frst in machine learning and deep learning models, manual proposed by Cho et al. in 2014. In most cases, the prediction tuning relies on experience and vulnerable to bias, and the performance of GRU is similar to LSTM, but the training tuning process is very time-consuming [47, 48]. Commonly used automatic parameter tuning algorithms [49] include time is reduced. Zhang et al. predict network-wide trafc speed with a deep learning model, and the results show that grid search, random search, and Bayesian optimization. Grid and random search have shortcomings, in which the new GRU obtains even better performance than LSTM [39]. Te hybrid model that combines data decomposition search may separate from the previous search information and machine learning or deep learning approach can ef- and cannot make full use of prior knowledge. Bayesian ciently improve the prediction performance. Te following optimization utilizes the prior distribution information of part reviews the literature on hybrid models that combine parameters [50]. It can auto efectively search for hyper- the data decomposition methods, machine learning, and parameters with fewer iteration steps. Bayesian optimization deep learning methods. Both machine learning and deep has become the most practical tool for parameter optimi- learning models require stable inputs, and data decompo- zation in predictive systems, which is successfully applied in sition methods can efectively improve the quality of model the deep learning model hyperparameter tuning recently, such as references [51–53]. input data and make the decomposed data more regular [40]. Choosing a reliable data sequence decomposition In response to these challenges, we propose a hybrid deep learning model for link dynamic vehicle count fore- method is critical for the stable and efective input required in the forecasting model. Traditional data decomposition casting. Te data decomposition method ICEEMDAN is methods, such as wavelet transform (WT), have been suc- adopted to decompose the irregular trafc demand data to cessfully applied in transportation. For example, Wang and more simple IMFs components. Te GRU model is used to Shi established a short-term trafc speed prediction model predict the IMFs’ components considering long- and short- based on chaotic wavelet transform and support vector term periodic characteristics of trafc demand, and Bayesian machine [16]. However, the traditional wavelet transforms optimization is utilized for automatically tuning multiple hyperparameters of the deep learning models. and Fourier transform techniques have disadvantages. For example, it is difcult to choose the mother wavelet, while empirical mode decomposition (EMD) and ensemble em- 3. Link Dynamic Vehicle Count pirical mode decomposition (EEMD) are more efective. Te Forecasting Model decomposition method can decompose the data into in- trinsic mode components. Many researchers have greatly Tis paper proposes a hybrid deep learning model that improved the accuracy of the prediction model based on the combines ICEEMDAN and GRU with Bayesian optimiza- decomposition of EMD and EEMD [41]. For example, Wei tion for link dynamic vehicle count forecasting, called 4 Journal of Advanced Transportation ICEEMDAN-GRU-BO. Figure 1 shows the framework and IMFs. Mode mixing reduces the EMD’s ability to recognize the main steps are as follows: diferent amplitudes in the actual data of the IMF compo- nents and afects the prediction accuracy of the hybrid model (1) Data processing, including data cleaning, normali- [43]. To overcome the problem of mode mixing, researchers zation, and completion. For example, we fll in the have proposed a new data decomposition method that adds missing data according to the average value of the Gaussian white noise, called “Ensemble Empirical Mode previous three steps. Decomposition” (EEMD) [55–58]. EEMD is a noise-assisted (2) Data decomposition. ICEEMDAN is adopted to data analysis method that aims to overcome the short- decompose the link dynamic vehicle count into comings of the EMD method. Te steps of EEMD are as several intrinsic mode functions (IMFs) and a re- follows: sidual. Tese mode components are simpler and Step (1): Before EMD decomposition, Gaussian white more regular, which can improve the accuracy of the noise is added to the original sequence data each time, deep learning model. and the construction sequence after addition is as (3) Subcomponents prediction and hyperparameters follows: optimization. GRU is used to predict subcompo- i i x (t) � x(t) + w (t), (3) nents of diferent frequencies as the basic prediction model. Tis framework employs a Bayesian opti- where x (t) is the construction sequence data, x(t) is mization algorithm to optimize the hyperparameters i 2 the original sequence data, and w (t) ∼ N(0, σ ) is the of each GRU model. Tese hyperparameters include added white noise sequence data. the initial learning rate, number of hidden units, L2 Step (2): EMD is adopted to decompose the con- regularization coefcient, and number of GRU struction sequence x (t) into n IMFs. layers. (4) Mode reconstruction and results evaluation. Te i i i x (t) � c (t) + r (t), (4) j n fnal prediction result can be obtained by summing j�1 up the predicted subcomponents and evaluated by i i the test dataset. where c (t) is the ith decompose of jth IMF and r (t) is j n the ith residual data. Te following describes the model details. Step (3): Repeat steps (1) and (2) M times, and add diferent white noise each time to obtain M groups of 3.1. Data Normalization. Te data normalization is corresponding IMFs. adopted in data processing to reduce data redundancy and Step (4): Calculate the average value of the corre- improve data usability. After normalizing, the original sponding IMFs of the M groups as the fnal IMFs. data are converted into a pure dimensionless values. Te training data are transformed into standardized data with c (t) � c (t), (5) j j zero mean and unit variance to better ft and prevent i�1 training divergence. Te standardized formula is as follows: where c (t) is ith decompose of jth IMF. X − X When the EEMD decomposition is completed, the (1) X′ � , original sequence data can be expressed as n IMFs and a s(X) residual. where X′ is the normalized data, X is the original data, X is the mean of the original data, and s(X) is the standard x(t) � c (t) + r (t), (6) j n deviation of the original data. j�1 In the prediction stage, the mean and variance param- eters are denormalized for the predicted data. where c (t), (t � 1, 2, . . . , T) is the jth IMF component decomposed at time t, r (t) is the fnal residual, and n is the X � s(X)X′ + X, (2) number of IMFs. Te main problem of the EEMD is the high computing where X′ is the predicted value after normalization and X is time and the residue of added noise present in the IMFs. In the fnal predicted value. the EEMD, it can be recognized that every x (t) is decomposed independently from the other realizations, and 3.2. Data Decomposition Method. Te EMD [54] is an the reconstructed signal contains residual noise and diferent adaptive method for the analysis of nonstationary and realizations of signal plus noise that may produce a diferent nonlinear signals. EMD can decompose the original signal number of modes. To overcome this limitation, the into the sum of amplitude and frequency modulation CEEMDAN algorithm was frst proposed by Torres et al. in functions, called “Intrinsic Mode Function” (IMF), and the 2011 [45]. Te main idea of the CEEMDAN is to add white fnal monotonic trend. However, EMD has the problem of noise at each phase of decomposition and calculate a unique “mode mixing,” which is very similar oscillations in diferent residue to obtain each mode. Journal of Advanced Transportation 5 Link vehicle count data Normalization Test data Train data ICEEMDAN Bayesian optimization decomposition Model initialization IMF1 IMF2 IMFn R Gaussian Process Regressor GRU GRU GRU GRU Run the model to calculate the RMSE of F1 F2 Fn Fn+1 validation set Fulfill Mode requirements? reconstruction Denormalization Output (x, y) Forecast results Evaluation results Figure 1: Framework of the link vehicle dynamic count forecasting model. Te resulting dissociation is complemented by a nu- where 〈·〉 is the action of averaging throughout the merically negligible error. However, there are still problems realizations. with some residual noise and “spurious” modes. Te ICE- Step 3: Compute the frst mode at the frst stage (k � 1) EMDAN technique is developed to improve the problems as d � x − R . 1 1 with some residual noise and “spurious” modes by Colo- Step 4: Estimate the second residue as the average of minas et al. [46]. local means of the realizations R + β E (w ) and de- 1 2 Given a composite signal x(t), where t is the sampling fne the second mode as follows: sequence of the signal, and let E (·) be the kth IMF obtained by EMD, and defne M(·) as the operator to calculate the d � R − R � R −〈MR + β E w 〉. (9) 2 1 2 1 1 1 2 local mean of the signal, then, the ICEEMDAN algorithm is described as follows: Step 5: For k � 3, . . ., K; calculate the kth residue: Step 1: Calculate the local means of I realizations using R �〈MR + β E w 〉, k k−1 k−1 k the EMD algorithm: (10) β � ε std r , k ≥ 1. i i k 0 k x � x + β E w , i � 1, . . . , I, (7) 0 1 Step 6: Compute the kth mode: where β � ε std(x)/E (w ) and ε is the reciprocal of 0 0 1 0 the desired signal-to-noise ratio between the frst added d � R − R . (11) k k−1 k noise and the analyzed signal. Step 7: Go back to step 4 for the next k. Step 2: Calculate the frst residue R1: Compared with EEMD and CEEMD, the ICEEMDAN R �〈Mx 〉, (8) can not only reduce the noise in the mode but also decrease 6 Journal of Advanced Transportation the residual spurious pattern problems caused by signal Reset Gate overlap, providing an accurate reconstruction of the original signal. h h t-1 t × + 3.3. GRU Model. As a special RNN structure, LSTM solves 1- the problems of vanishing gradient and explosive gradient by changing the cell structure and adding storage cells to r u y t t t determine whether it is necessary to remember information. σ σ tanh GRU [38] improved LSTM by reducing the number of gates to decrease the training time. As shown in Figure 2, the GRU units transfer the input vector x to the output vector h t t Update Gate through time t iteration. GRUs consist of two gates: the reset gate and the update gate. Te main process in a GRU unit Figure 2: Te structure of GRU. can be described as follows: r � σ