Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

High‐performance traffic speed forecasting based on spatiotemporal clustering of road segments

High‐performance traffic speed forecasting based on spatiotemporal clustering of road segments INTRODUCTIONShort‐term traffic prediction is becoming increasingly essential for providing high‐quality transportation services [1]. Reliable future traffic state information is conducive for formulating effective management (e.g. signal timing optimisation [2] and ramp control [3]) and assisting travellers’ route planning and departure time scheduling [4]. However, it remains a challenging task owing to the requirements of both prediction precision and timely calculation. With the rapid development of artificial intelligence technology, new prediction models have gradually emerged with deeper structures. These models offer significant improvements in accuracy performance but impose greater computation burden. Therefore, it is crucial to maintain the quality of these methods while reducing computational load significantly. Hence, we herein introduce a novel strategy that applies a deep‐learning framework in classified groups of road segments, where classification is conducted to reduce the scale of learning.The main contributions of this research can be summarised as follows:A spatiotemporal clustering method of road segments is proposed, which considers both traffic speed patterns and spatial topology information.Based on the clustering results, we developed a novel prediction scheme. The sequence‐to‐sequence learning (Seq2Seq) algorithm is separately deployed cluster by cluster in parallel.We conducted numerical tests on a large‐scale real‐world dataset provided by AMAP, a leading mobile navigation application in China. The results indicate that our approach yields better performances than other state‐of‐the‐art baselines in terms of both prediction accuracy and computational efficiency.The remainder of this study is organised as follows. Section 2 presents recent relevant studies. Section 3 describes our proposed method. Section 4 presents the experimental dataset and analysis result. Finally, Section 5 concludes the study and outlines future research.LITERATURE REVIEWIn this section, we review traffic prediction models and traffic prediction with clustering technology, which are related to the current study.Traffic prediction modelsIn the past decade, substantial efforts have been expended to model the temporal and spatial correlations of traffic state evolution under the topic of forecasting, the summary of which is presented as follows.Statistical methods such as auto‐regressive integrated moving average, Kalman filtering, and their variants [5–7] can predict future values based on previous observations by successive time‐series analysis. However, statistical models typically rely on the stationary assumption, and they cannot accurately capture the time‐varying and non‐linear characteristics of traffic flow. The massive accumulation of traffic data provided opportunities for the realisation of data‐driven approaches. The fuzzy theory [8], artificial neural network (ANN) [9], support vector machines [10], and K‐nearest neighbour model (KNN) [11] have also been exploited for short‐term traffic prediction. Furthermore, deep‐learning methods have reported better performances. Recurrent neural networks (RNNs) [12], stacked auto‐encoders [13], and deep belief networks [14] are powerful tools for learning temporal dependencies from data sequence.To construct a robust prediction framework, the temporal features must be extracted and the spatial information of road networks must be incorporated. Conventional vector autoregressive models consider the effects of upstream and downstream links [15]. Cross‐correlation coefficient [16] and convolutional neural networks (CNNs) [17, 18] have been utilised to analyse spatial relevance. Subsequently, graph convolution networks (GCNs) have been proposed to address non‐Euclidean structured data, which have been investigated in the transportation domain. Temporal convolution networks [19], long short‐term memory networks (LSTMs) [20], and Seq2Seq [21] were fused with GCNs to depict spatiotemporal dependencies in traffic prediction.To summarise, remarkable achievements of various models have been reported in previous studies. Unlike the aforementioned studies, we propose an efficient and general prediction paradigm based on the clustering of road segments.Traffic prediction with clustering technologyIn [22], a non‐parametric clustering‐based technique that provides accurate traffic forecasting was established through the exploitation of traffic data dynamics, whereas in [23], an urban traffic flow prediction system using a multifactor pattern recognition model was established, which combined a Gaussian mixture model clustered with an ANN. Tang et al. constructed a fuzzy neural network to forecast travel speed, which was trained by the K‐means method to partition inputs into different clusters and a weighted recursive least‐squares estimator to optimise the parameters [24]. Subsequently, Wang et al. developed a traffic threshold identification module based on an improved K‐means clustering algorithm and implemented a traffic flow prediction framework based on adaptive speed thresholds [25]. Shen et al. presented a novel traffic speed prediction method based on temporal clustering analysis and a deformable convolution neural network that could discriminate traffic environments [26]. The aforementioned clustering targets were the observed data samples, where the spatial information of road networks was not incorporated.Song et al. measured the similarities between road segments based on adjacency, connectivity, and congestion; subsequently, they generated topological maps by road segment clustering [27]. However, the way to proceed with traffic prediction was not involved. The spatiotemporal performance trends at the network level and for individual links were mined through K‐means clustering [28], which could create new prediction error measures rather than enhance the prediction performance. In [29], a clustering procedure was used to identify time‐variant clusters of road segments with high spatial correlations; subsequently, they were fed into an ANN‐based traffic forecasting model, which is the most correlated model with ours. Nevertheless, the simple multilayer perceptron (MLP) structure will impair the capacity in capturing the temporal dependencies, especially for multistep prediction. We further propose obtaining the optimal number of clusters through the pre‐experiment on a validation set along with providing a comprehensive comparison of prediction performance between link‐, cluster‐, and network‐level inputs.METHODOLOGYIn this section, we first interpret the notations of the variables used herein; subsequently, we illustrate the devised method in detail.PreliminariesWe model the network as a directed graph G(N,L), where the node‐set N comprises road segments (also called links), and L is the set of directed edges representing the connections among the road segments. D the distance matrix of the road segments and the variable Dij denotes the length of the shortest path from the road segment xi to xj considering the driving direction.We denote vti as the speed of the road segment xi (∀xi∈N) at the t th time slot, where the time slot is divided with a fixed duration (e.g. 5 min), and the ‘speed’ is defined as the average speed of floating cars travelling through the road segment during this time interval. Vt∈RN(|N|=N the total number of road segments) is the speed vector of the underlying road network at the t th time slot, where the i th element is (Vt)i=vti.Given the historical observations {Vt−s∣s=0,1,⋯,m} (m is the look‐back time window) and network topology G, the traffic speed forecasting problem is to seek the trustworthy approximation V̂t+n of the future value Vt+n, where n is the required prediction step.Road segment clusteringThe traffic conditions of the entire road network vary significantly over both time and space [30], which is challenging for traffic prediction. However, focusing on the local part, the traffic states are relatively stable and closely dependent [31]. Hence, our major objective is to partition the given road network into clusters, with each containing a group of considerably correlated road segments.Based on the above notations, we define Pij to characterise the similarity in traffic patterns between road segments xi and xj:1Pij=∑uvui−vuj2τ,u∈τ,where τ refers to a historical observed dataset that covers a sufficient period (e.g. one week), and |τ| denotes its cardinality.However, only considering the observed data into account may yield ambiguous results for the sake of coincidences in numerical values (e.g., two irrelevant road segments may possess similar speeds). As the traffic flow at downstream and upstream roads affect each other through transfer and feedback effects [32], it is necessary to incorporate the topology information into the clustering procedure. Combined with the spatial distance, the spatiotemporal similarity between road segments i and j can be quantified as2Sij=αP∼ij+βD∼ij,where P∼ij∈[0,1] and D∼ij∈[0,1] are the normalised results of Pij and Dij, respectively; α and β are coefficients to adjust the effects of corresponding components.Once the similarity matrix of road segments is derived, our next goal is to partition the road segments into a set of disjoint clusters such that the total dependencies between road segments within the same cluster are maximised. Let Fk be the k th cluster with q road segments and M be the total number of clusters:3Fk=argminx1,⋯,xi,⋯xN∑i∑jSijqxi,xj∈Ns.t.∪Mk=1Fk=NFk∩Fw=∅,∀k,w∈1,⋯,MThis can be solved via agglomerative clustering, which generates clusters in a bottom‐up way to produce a binary tree with each data point as either a leaf node or root node [33].The algorithm is initialised by treating each road segment as a separate cluster. The second step is to calculate a distance matrix embodying the distance between any two clusters. In this study, we used the average‐linkage criterion [34], which determines the average similarity between all road segments of the two clusters as the distance for clustering. It is expressed as4Rij=∑u∑vSuvFi×Fj,xu∈Fi,xv∈FjIn the third step, the two clusters with the shortest distance are identified and merged into a new cluster. Subsequently, steps 2 and 3 are repeated until the number of clusters reaches the given number.Owing to the clustering processing, the road segments in the underlying network are classified into several groups. In the following sections, we denote {xk1,xk2,…,xkq} as the road segments in the cluster Fk, and Vt(k)={vtk1,vtk2,⋯,vtkq} as the corresponding speed observations at the t th time slot.Traffic prediction methodAs road segments within the same cluster are strongly spatial correlated, the primary objective of the traffic prediction model is to capture the temporal evolution of the traffic state. We leverage the Seq2Seq architecture [35] to model the temporal dependencies, which unites two RNN modules to overcome the disadvantage of fixed output timestamp of a single RNN. As shown in Figure 1, the encoder embeds the input sequence {Vt−m(k),⋯,Vt−1(k),Vt(k)}, and its final hidden state is fed into the decoder, which learns to predict the future traffic speed {V̂t+1(k),V̂t+2(k),⋯,V̂t+n(k)}. It is noteworthy that the data stream of our Seq2Seq model is cluster‐level Vt(k), which is vastly different from link‐level vti and network‐level Vt in previous studies [36]. The benefits of this manner will be emphasised in Section 4.1FIGUREStructure of the sequence‐to‐sequence network (a) training stage, (b) testing stageDuring training, historical ground truth series are fed to the decoder (see Figure 1(a)), whereas ground truth observations are replaced by predictions generated by the model itself during testing (see Figure 1(b)). The model performance may degrade due to the discrepancy of input distributions between the training and testing stage. Hence, scheduled sampling is integrated [37] into the model, where the encoder is fed with either the ground truth observation with probability ∈i or the prediction by the model with probability 1‐∈i at the ith iteration. In this study, we set ∈i as5∈i=1,i<Λ0,i≥Λ,where Λ is the predefined threshold.Throughout the model operation, the hidden state ht−1, which is passed to next timestamp t, together with input Vt(k) generates a new hidden state ht at each iteration. Both the encoder and decoder adopt the LSTM structure [38], and the calculation formulas are shown in Equations (6)–(11):6ft=σWf·ht−1||Vtk+bf7it=σWi·ht−1||Vtk+bi8c∼t=tanhWc·ht−1||Vtk+bc9ct=ft⊙ct−1+it⊙c∼t10ot=σWo·ht−1||Vtk+bo11ht=ot⊙tanhct,where ft,it,ot,andct represent the forget gate, input gate, output gate, and memory cell vector, respectively; [·||·] concatenates two tensors along the same dimensions; σ(y)=11+e−y and tanh(y)=ey−e−yey+e−y are two widely used non‐linear activation functions that map the input into (0,1) and (−1,1), respectively; ⊙ refers to the Hadamard product that conducts the element‐wise multiplication operation; Wf, Wi, Wc, and Wo are weight matrices that conduct a linear transformation while bf,bi,bc, and bo are the corresponding bias vectors.Equation (12) denotes the fully connected layer from the hidden state to the output. The dimensions of the weighted parameter matrixWv and intercept parameter bv are consistent with the output:12V̂t+jk=Wvht+j−1+bv,1≤j≤nTo jointly reduce the errors in the multiple‐step prediction, the loss function is defined as the mean absolute error between (V̂t+1(k),V̂t+2(k),⋯,V̂t+n(k)) and (Vt+1(k),Vt+2(k),…,Vt+n(k)), which is given by13loss=1n∑j=1n∥V̂t+jk−Vt+jk∥1where ∥·∥1 returns the L1 norm of the vector.All the parameters are updated by minimising the loss function through the stochastic mini‐batch gradient descent (BGD) algorithm in the training stage. The training steps of the proposed model is elaborated in Algorithm 1.EXPERIMENTSDataset descriptionReal‐world traffic dataset was collected from anonymous navigation users of the AMAP (https://www.amap.com/). We selected the inner fourth ring road in Beijing as the study site, which encircled the downtown area. As shown in Figure 2, the 65‐km fourth ring road is partitioned into 162 road segments of length 400 m. Subsequently, we calculated the 5‐min average speed for each road segment by utilising the instantaneous state of the trajectory points. The missing values were completed by the nearest non‐missing values using MATLAB [39]. The average traffic speeds of morning and evening peak hours on weekdays of October 2016 are shown in Figures 3(a) and (b), respectively. The entire dataset ranging from October 1, 2016, to November 27, 2016, was separated into three independent subsets, the details of which are provided in Table 1.2FIGURESketch of road segments in geographic information system map3FIGUREAverage traffic speed of the fourth ring road on weekdays in October 2016 (a) morning peak hours, (b) evening peak hours1TABLEInformation of experimental datasetsItemTraining setValidation setTesting setDateOct 1–Nov 13Nov 14–Nov20Nov 21–Nov 27Prediction time range06:00–22:00 (192 data points per day)Dataset size44 × 192 × 162 ≈ 1.4 M7 × 192 × 162 ≈ 220 K7 × 192 × 162 ≈ 220 KExperimental settingIn this subsection, we compare our prediction scheme with several prevailing methods:ALGORITHM 1: Training the modelInput Traffic speed observations of kth cluster in the training set: {V1(k),⋯,VT(k)}Input timestamp: m; Prediction timestamp: nThreshold of train steps for scheduled sampling: ΛOutput The model with learnt parameters1  Procedure train Seq2Seq model2   Initialise two null sets: U←∅, V←∅3   for all available time slots t (1≤t≤T) do4Xencoder←[Vt−m(k),…,Vt(k)],Xdecoder←[Vt+1(k),…,Vt+n−1(k)], Ylabel←[Vt+1(k),…,Vt+n(k)]5    U←(Xencoder,Xdecoder,Ylabel), V←(Xencoder,Ylabel)6   end for7   Initialise the original hidden status of the encoder, all the weight and bias parameters8   for all training steps i (1≤i≤I) do9    if i<Λ10     Randomly extract a batch of samples Ub from U11     Minimise Equation (13) to update the parameters by the BGD algorithm within Ub12    else13     Randomly extract a batch of samples Vb from V14     Minimise Equation (13) to update the parameters by the BGD algorithm within Vb15   end for16  end procedure    Historical average (HA): The HA model performs predictions based on the mean values of the same timestamp in the training set, which serves as a basic criterion.MLP: An MLP [40] with one hidden layer is established, and we set the number of hidden neurons to be twice the feature dimension.KNN: The KNN [41] calculates the weighted summation of the corresponding future values belonging to similar samples. The hyperparameter K is selected from 5 to 30 through the performances on the validation set.LSTM: In LSTM [38], the inputs are reshaped into a matrix with one axis as the time step and another axis as the number of road segments, which can address temporal correlations well.Graph attention network (GAT): The GAT [42] leverages masked self‐attentional layers to specify different weights to different neighbourhoods, which considers spatial information.Seq2Seq: It is facilitated with the same structure as that of our prediction model except for the cluster‐level input.Traffic graph convolutional (TGC)‐LSTM: The TGC‐LSTM [20] unifies the traffic graph convolution operation and LSTM structure to attain spatiotemporal correlations in the transportation system.It is noteworthy that KNN, LSTM, GAT, and Seq2Seq are applicable for both link‐wise modelling and network‐wide inputs. The performances of these models under the two scenarios were tested in the following experiments.Speed records in the past 60 min (i.e. m=11) were employed as inputs for all models (excluding HA) to forecast the traffic states in the next 10, 20, and 30 min. The prediction accuracy of all road segments was assessed via three typical metrics, that is, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE), expressed as follows:14MAE=∑i=1N∑t∈Hv̂ti−vtiN×H15MAPE=∑i=1N∑t∈Hv̂ti−vtivtiN×H16RMSE=∑i=1N∑t∈Hv̂ti−vti2HN,where vti and v̂ti are the ground truth and prediction values at the ith road segment at the tth time slot, respectively; H denotes the testing set.To ensure fairness in both accuracy and efficiency comparisons, the same training hyperparameters were retained in the neural‐network‐based methods (except TGC‐LSTM). In detail, the batch size was 512, the learning rate 10−2, and the training steps 2500 with the Adam optimiser. We accommodate the parameters of TGC‐LSTM to the underlying dataset on the basis of suggestions in [20]. The LSTM, encoder, and decoder of Seq2Seq comprised of one hidden layer and eight hidden units for the link‐level input and 160 hidden units for the network‐level input. The threshold of scheduled sampling for the Seq2Seq model was Λ=1700; 1‐hop and 2‐hop neighbours were considered separately in the GAT. The above settings are determined through pre‐experiment on validation set which achieves a balance between prediction accuracy and computation efficiency.For our prediction paradigm, we set α=β=0.5 and used the historical observations from October 10 to 16 for clustering (i.e. we calculated Pij using Equation 1). We exploit the data from one week for clustering because traffic patterns in both weekdays and weekends are appropriately considered. The optimal number of clusters was selected as 12 via the pre‐experiment on the validation set of the corresponding prediction performances from (4, 8, 12, 16, 20), (the details will be provided in Section 4.4). The model settings remained the same as those mentioned above except that 16 hidden units were used.Our experimental platform was a server with two central processing units (CPUs) (Intel(R) Xeon(R) CPU E5‐2673 v3 @2.40 GHz, 24 cores), 256‐GB RAM (Random Access Memory), and four Graphics Processing Units (GPUs) (NVIDIA Quadro P5000, 16 GB memory). We used python 3.6.3 with scikit‐learn [43] and tensorflow [44] to achieve contrasting models. All the algorithms were coded in parallel.Prediction performanceTables 2–4 present the comparisons of the proposed model and benchmark algorithms for 10‐, 20‐, and 30‐min ahead forecasting on the testing datasets. The following conclusions were obtained from the results:Our prediction strategy, that is, implementing the Seq2Seq model cluster by cluster achieved the best prediction results for all three evaluation metrics under various steps and exhibited superior computational efficiency. Although the specialised spatiotemporal deep‐learning framework, TGC‐LSTM, presents better performances than other baselines, it is also inferior to our approach.Figure 4 compares the execution time and MAE of the KNN, LSTM, GAT, and Seq2Seq fed with link‐ and network‐level inputs under a 20‐min time period. As expected, link‐wise modelling requires a lengthy calculation time while affording better precision, whereas the network‐level input performs worse as the heterogeneous traffic states of the entire network cause the underfitting problem.Figure 5 shows the execution time and MAE of the Seq2Seq model with link‐, network‐, and cluster‐level inputs during a 10–30 min interval. It is clear that the clustering pre‐process improved the accuracy and efficiency significantly. Compared with the link‐level input, the cluster‐level input incorporates spatial features and reduces the required number of training rounds from the number of links to the number of clusters (e.g. for the road network with 160 links, the link‐level input needs 160 training rounds link by link. When we divide them into 20 clusters, only 20 training rounds are required). Unlike the network‐level input, the cluster algorithm partitions the network into highly correlated groups, which enables the spatiotemporal dependencies to be learned more easily. It is noteworthy that the running time of the Seq2Seq model increases rapidly with the input dimension. The cluster‐level input decreases the input dimension at each iteration, and the parallel computing for each cluster contributes to reduced time consumption. The results serve as a powerful support to our motivation of this study.2TABLEPerformance comparison under 10‐min prediction intervalModelMAPEMAERMSETime(s)HA31.89%9.1112.96‐MLP11.83%4.427.35128KNN(link/network)13.35%/21.63%4.71/6.697.52/10.55109/102LSTM(link/network)11.40%/12.71%4.26/4.587.17/6.92332/182GAT1‐hop (link/network)11.39%/11.99%4.27/4.446.94/7.31162/47GAT2‐hop (link/network)11.29%/11.83%4.22/4.406.78/7.20176/47Seq2Seq(link/network)11.39%/12.98%4.26/4.577.19/6.91358/189TGC‐LSTM11.64%4.226.67483Seq2Seq(cluster)10.62%3.996.3356Abbreviations: MAPE, mean absolute percentage error; MAE, mean absolute error; RMSE, root mean squared error; HA, historical average; KNN, K‐nearest neighbour; LSTM, long short‐term memory network; GAT, graph attention; Seq2Seq, sequence‐to‐sequence; TGC, traffic graph convolution.3TABLEPerformance comparison under 20‐min prediction intervalModelMAPEMAERMSETime (s)HA31.89%9.1112.96–MLP16.85%5.839.92128KNN(link/network)19.03%/23.29%6.25/7.169.80/11.18112/92LSTM(link/network)16.61%/17.78%5.70/5.989.80/9.05334//179GAT1‐hop(link/network)16.54%//17.12%5.69/5.929.56/9.96162/46GAT2‐hop(link/network)16.44%/16.82%5.63/5.879.37/9.74178/47Seq2Seq(link/network)16.42%/17.58%5.67/5.809.83/8.96401/221TGC‐LSTM16.28%5.498.89505Seq2Seq(cluster)15.01%5.238.68684TABLEPerformance comparison under 30‐min prediction intervalModelMAPEMAERMSETime (s)HA31.89%9.1112.96‐MLP21.11%6.9411.57129KNN(link/network)23.42%/24.81%7.43/7.5911.25/11.74111/98LSTM(link/network)20.95%/21.61%6.79/7.1711.44/10.76336/185GAT1‐hop(link/network)20.88%/21.90%6.79/7.1811.27/11.81162/46GAT2‐hop(link/network)20.67%/21.35%6.72/7.1111.08/11.53178/45Seq2Seq(link/network)20.70%/21.00%6.77/6.6311.54/10.47436/253TGC‐LSTM19.53%6.4210.46491Seq2Seq(cluster)18.97%6.2110.36784FIGUREComparison of link‐ and network‐level inputs5FIGUREComparison of sequence‐to‐sequence model with link‐, network‐, and cluster‐level inputsClustering resultFigure 6 shows the change of MAE with the number of clusters in (4, 8, 12, 16, 20) on the validation and testing set under 20‐min prediction step, respectively. The variation in errors exhibits the same trend between validation and testing, indicating that it is appropriate to select the optimal number of clusters through pre‐experiment on the validation set. Meanwhile, the results are insensitive to the number of clusters, which verifies the robustness of our prediction scheme. Figure 7 presents the partition results of road segments with 12 clusters, for example, the road segments 6‐13 shown in Figure 2 are divided into cluster 2 (shown in orange) in Figure 7(a).6FIGUREMean absolute error of different cluster numbers on validation and testing under 20‐min interval7FIGUREPartition results of 162 road segments in 12 clusters (a) spatiotemporal clustering, (b) clustering without spatial informationComparing Figures 7(a) and (b), the distribution of clusters without spatial information is disorderly (e.g. the road segments in the north and south fourth ring road are both classified into cluster 7, which unlikely owns the physical interactions), which is clearly not the desirable partition for traffic prediction.Discussion of cluster‐level inputTo further validate the effect of our prediction strategy, an additional experiment was conducted to extract road segments randomly to form the same number of groups as that of our clustering process. As shown in Table 5, feeding the model with random sets of road segments does not improve the prediction accuracy, which proves the necessity of our designed clustering algorithm. The results of this experiment show that the improvements in the cluster‐level input are owing to the highly relevant spatial information within each cluster. A simple modification of the input dimension through feeding the random link sets cannot boost the model performance.5TABLEContrast of prediction performance for clustering and random selectionIntervalModelMAPEMAERMSE10‐minSeq2Seq(cluster)10.62%3.996.33Seq2Seq(random)11.82%4.327.0420‐minSeq2Seq(cluster)15.01%5.238.68Seq2Seq(random)16.55%5.659.3530‐minSeq2Seq(cluster)18.97%6.2110.36Seq2Seq(random)20.40%6.5910.75Analysis of spatiotemporal similarity measurementFigure 8 depicts the performances of the proposed prediction scheme with different coefficients α,β in Equation (2). The results explicate that the clustering with a relative trade‐off between pattern similarity (P∼ij) and spatial distance (D∼ij) is better for the prediction accuracy. The partition of road segments without the consideration of pattern similarity (α=0) or spatial distance (β=0) will compromise the prediction performance. Moreover, Table 6 assesses the effects of clustering with our spatiotemporal similarity measurement Sij and Pearson product‐moment correlation coefficient [45, 46] on prediction performances. The results prove that the devised spatiotemporal similarity measurement is more beneficial for promoting traffic prediction accuracy.8FIGURESensitivity analysis of spatiotemporal similarity measurement6TABLECluster‐level traffic prediction accuracy with different similarity measurementsIntervalSimilarity measurementMAPEMAERMSE10‐minSij10.62%3.996.33PPMCC10.99%4.076.4020‐minSij15.01%5.238.68PPMCC15.23%5.338.7330‐minSij18.97%6.2110.36PPMCC19.28%6.3310.44Abbreviation: PPMCC, Pearson product‐moment correlation coefficient.CONCLUSIONA novel high‐performance traffic prediction paradigm was developed in this study. In the first step, road segments in the underlying road network were partitioned into several groups through a spatiotemporal clustering algorithm. Subsequently, strongly correlated and mutually influencing road segments were introduced into the same cluster. Next, the Seq2Seq model was employed cluster by cluster to perform traffic prediction. Using a real‐world traffic dataset, our approach demonstrated significant improvements over other benchmarks in terms of both prediction accuracy and computational efficiency. Furthermore, we analysed the differences between link‐ and network‐level inputs, the influences of the cluster number, the effects of randomly selected road segment groups, and the designed spatiotemporal similarity measurement. In future studies, we may extend our prediction scheme for application to large‐scale urban road networks, where the influences of traffic lights may be considered into the clustering process.ACKNOWLEDGMENTSThe research is supported by grants from the National Key R&D Program of China (2018YFB1601600). The research is supported in part by Tsinghua‐Toyota Joint Research Institution.REFERENCESLin, Y., Wang, P., Ma, M.: Intelligent Transportation System (Its): Concept, Challenge and Opportunity. In: 2017 IEEE 3rd International Conference on Big Data Security on Cloud (Bigdatasecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), Beijing, China (2017)Ahmed, A., et al.: Real‐time dynamic traffic control based on traffic‐state estimation. Transp. Res. Rec. 2673(5), 584–595 (2019)Goatin, P., Göttlich, S., Kolb, O.: Speed limit and ramp meter control for traffic flow networks. Eng. Optim. 48(7), 1121–1144 (2016)Yuan, J., et al.: T‐drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Trans. Knowl. Data Eng. 25(1), 220–232 (2013)Wang, H., et al.: A novel work zone short‐term vehicle‐type specific traffic speed prediction model through the hybrid EMD–ARIMA framework. Transportmetrica B Transp. Dyn. 4(3), 159–186 (2016)Wang, Y., Papageorgiou, M.: Real‐time freeway traffic state estimation based on extended kalman filter: a general approach. Transp. Res. Part B Methodol. 39(2), 141–167 (2005)Wang, Y., Papageorgiou, M., Messmer, A.: Real‐time freeway traffic state estimation based on extended Kalman filter: Adaptive capabilities and real data testing. Transp. Res. Part A Policy Pract. 42(10), 1340–1358 (2008)Guo, J., et al.: Short‐term traffic flow prediction using fuzzy information granulation approach under different time intervals. IET Intel. Transport Syst. 12(2), 143–150 (2017)Chan, K.Y., et al.: Neural‐network‐based models for short‐term traffic flow forecasting using a hybrid exponential smoothing and Levenberg–Marquardt algorithm. IEEE Trans. Intell. Transp. Syst. 13(2), 644–654 (2011)Wang, J., Shi, Q.: Short‐term traffic speed forecasting hybrid model based on chaos–wavelet analysis‐support vector machine theory. Transp. Res. Part C Emerging Technol. 27, 219–232 (2013)Sun, B., et al.: Short‐term traffic forecasting using self‐adjusting K‐nearest neighbours. IET Intel. Transport Syst. 12(1), 41–48 (2017)Zhao, Z., et al.: LSTM network: A deep learning approach for short‐term traffic forecast. IET Intel. Transport Syst. 11(2), 68–75 (2017)Lv, Y., et al.: Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 16(2), 865–873 (2014)Huang, W., et al.: Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans. Intell. Transp. Syst. 15(5), 2191–2201 (2014)Chandra, S.R., Al‐Deek, H.: Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 13(2), 53–72 (2009)Pan, T., et al.: Short‐term traffic state prediction based on temporal–spatial correlation. IEEE Trans. Intell. Transp. Syst. 14(3), 1242–1254 (2013)Ma, X., et al.: Learning traffic as images: a deep convolutional neural network for large‐scale transportation network speed prediction. Sensors 17(4), 818 (2017)Ke, J., et al.: Short‐term forecasting of passenger demand under on‐demand ride services: a spatio‐temporal deep learning approach. Transp. Res. Part C Emerging Technol. 85, 591–608 (2017)Yu, B., Yin, H., Zhu, Z.: Spatio‐temporal graph convolutional networks: A deep learning framework for traffic forecasting. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm (2018)</bib>Cui, Z., et al.: Traffic graph convolutional recurrent neural network: a deep learning framework for network‐scale traffic learning and forecasting. IEEE Trans. Intell. Transp. Syst. 21(11), 4883–4894 (2020).Zhang, Z., et al.: Multistep speed prediction on traffic networks: a deep learning approach considering spatio‐temporal dependencies. Transp. Res. Part C Emerging Technol. 105, 297–322 (2019)Kehagias, D., Salamanis, A., Tzovaras, D.: Speed pattern recognition technique for short‐term traffic forecasting based on traffic dynamics. IET Intel. Transport Syst. 9(6), 646–653 (2015)Kim, Y.‐J., Hong, J.‐S.: Urban traffic flow prediction system using a multifactor pattern recognition model. IEEE Trans. Intell. Transp. Syst. 16(5), 2744–2755 (2015)Tang, J., et al.: An improved fuzzy neural network for traffic speed prediction considering periodic characteristic. IEEE Trans. Intell. Transp. Syst. 18(9), 2340–2350 (2017)Xiangxue, W., Lunhui, X.: Wavelet‐based short‐term forecasting with improved threshold recognition for urban expressway traffic conditions. IET Intel. Transport Syst. 12(6), 463–473 (2018)Shen, G., et al.: Research on traffic speed prediction by temporal clustering analysis and convolutional neural network with deformable kernels (May, 2018). IEEE Access 6, 51756–51765 (2018)Song, Q., et al.: An urban topological map generation method for traffic flow prediction based on road segment clustering with floating vehicle trajectory dataset. In: 19th COTA International Conference of Transportation Professionals (CICTP 2019), Nanjing , China (2019)Asif, M.T., et al.: Spatiotemporal patterns in large‐scale traffic speed prediction. IEEE Trans. Intell. Transp. Syst. 15(2), 794–804 (2013)Zhang, B., et al.: Traffic clustering and online traffic prediction in vehicle networks: a social influence perspective. In: 2012 Proceedings IEEE INFOCOM, Orlando, Florida (2012)Stathopoulos, A., Karlaftis, M.: Temporal and spatial variations of real‐time traffic data in urban areas. Transp. Res. Rec. 1768(1), 135–140 (2001)Zou, H., et al.: A spatial analysis approach for describing spatial pattern of urban traffic state. In: 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal (2010)Dong, C.‐J., et al.: Spatial and temporal characteristics for congested traffic on urban expressway. J. Beijing Univ. Technol. 38(8), 1242–1246 (2012)Xia, J., Huang, W., Guo, J.: A clustering approach to online freeway traffic state identification using its data. KSCE J. Civ. Eng. 16(3), 426–432 (2012)Lukasová, A.: Hierarchical agglomerative clustering procedure. Pattern Recognit. 11(5‐6), 365–381 (1979)Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, Montreal, Canada (2014)Liao, B., et al.: Deep sequence learning with auxiliary information for traffic prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK (2018)Bengio, S., et al.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, Montreal, Canada (2015)Hochreiter, S., Schmidhuber, J.: Long short‐term memory. Neural Comput. 9(8), 1735–1780 (1997)Folch‐Fortuny, A., Arteaga, F., Ferrer, A.: Missing data imputation toolbox for matlab. Chemom. Intell. Lab. Syst. 154, 93–100 (2016)Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back‐propagating errors. Nature 323(6088), 533–536 (1986)Dudani, S.A.: The distance‐weighted K‐nearest‐neighbor rule. IEEE Trans. Syst. Man Cybern. SMC‐6(4), 325–327 (1976)Zhang, K., et al.: Graph attention temporal convolutional network for traffic speed forecasting on road networks. Transportmetrica B Transp. Dyn. 1–19 (2020).Pedregosa, F., et al.:Scikit‐learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Abadi, M., et al.: Tensorflow: a system for large‐scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, Georgia (2016)Cheng, T., Haworth, J., Wang, J.: Spatio‐temporal autocorrelation of road network data. J. Geog. Syst. 14(4), 389–413 (2012)Diamantopoulos, T., et al.: Investigating the effect of global metrics in travel time forecasting. In: 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), Hague, The Netherlands (2013) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IET Intelligent Transport Systems Wiley

High‐performance traffic speed forecasting based on spatiotemporal clustering of road segments

Loading next page...
 
/lp/wiley/high-performance-traffic-speed-forecasting-based-on-spatiotemporal-mbS0CJRaXB

References (47)

Publisher
Wiley
Copyright
© 2021 The Institution of Engineering and Technology
eISSN
1751-9578
DOI
10.1049/itr2.12016
Publisher site
See Article on Publisher Site

Abstract

INTRODUCTIONShort‐term traffic prediction is becoming increasingly essential for providing high‐quality transportation services [1]. Reliable future traffic state information is conducive for formulating effective management (e.g. signal timing optimisation [2] and ramp control [3]) and assisting travellers’ route planning and departure time scheduling [4]. However, it remains a challenging task owing to the requirements of both prediction precision and timely calculation. With the rapid development of artificial intelligence technology, new prediction models have gradually emerged with deeper structures. These models offer significant improvements in accuracy performance but impose greater computation burden. Therefore, it is crucial to maintain the quality of these methods while reducing computational load significantly. Hence, we herein introduce a novel strategy that applies a deep‐learning framework in classified groups of road segments, where classification is conducted to reduce the scale of learning.The main contributions of this research can be summarised as follows:A spatiotemporal clustering method of road segments is proposed, which considers both traffic speed patterns and spatial topology information.Based on the clustering results, we developed a novel prediction scheme. The sequence‐to‐sequence learning (Seq2Seq) algorithm is separately deployed cluster by cluster in parallel.We conducted numerical tests on a large‐scale real‐world dataset provided by AMAP, a leading mobile navigation application in China. The results indicate that our approach yields better performances than other state‐of‐the‐art baselines in terms of both prediction accuracy and computational efficiency.The remainder of this study is organised as follows. Section 2 presents recent relevant studies. Section 3 describes our proposed method. Section 4 presents the experimental dataset and analysis result. Finally, Section 5 concludes the study and outlines future research.LITERATURE REVIEWIn this section, we review traffic prediction models and traffic prediction with clustering technology, which are related to the current study.Traffic prediction modelsIn the past decade, substantial efforts have been expended to model the temporal and spatial correlations of traffic state evolution under the topic of forecasting, the summary of which is presented as follows.Statistical methods such as auto‐regressive integrated moving average, Kalman filtering, and their variants [5–7] can predict future values based on previous observations by successive time‐series analysis. However, statistical models typically rely on the stationary assumption, and they cannot accurately capture the time‐varying and non‐linear characteristics of traffic flow. The massive accumulation of traffic data provided opportunities for the realisation of data‐driven approaches. The fuzzy theory [8], artificial neural network (ANN) [9], support vector machines [10], and K‐nearest neighbour model (KNN) [11] have also been exploited for short‐term traffic prediction. Furthermore, deep‐learning methods have reported better performances. Recurrent neural networks (RNNs) [12], stacked auto‐encoders [13], and deep belief networks [14] are powerful tools for learning temporal dependencies from data sequence.To construct a robust prediction framework, the temporal features must be extracted and the spatial information of road networks must be incorporated. Conventional vector autoregressive models consider the effects of upstream and downstream links [15]. Cross‐correlation coefficient [16] and convolutional neural networks (CNNs) [17, 18] have been utilised to analyse spatial relevance. Subsequently, graph convolution networks (GCNs) have been proposed to address non‐Euclidean structured data, which have been investigated in the transportation domain. Temporal convolution networks [19], long short‐term memory networks (LSTMs) [20], and Seq2Seq [21] were fused with GCNs to depict spatiotemporal dependencies in traffic prediction.To summarise, remarkable achievements of various models have been reported in previous studies. Unlike the aforementioned studies, we propose an efficient and general prediction paradigm based on the clustering of road segments.Traffic prediction with clustering technologyIn [22], a non‐parametric clustering‐based technique that provides accurate traffic forecasting was established through the exploitation of traffic data dynamics, whereas in [23], an urban traffic flow prediction system using a multifactor pattern recognition model was established, which combined a Gaussian mixture model clustered with an ANN. Tang et al. constructed a fuzzy neural network to forecast travel speed, which was trained by the K‐means method to partition inputs into different clusters and a weighted recursive least‐squares estimator to optimise the parameters [24]. Subsequently, Wang et al. developed a traffic threshold identification module based on an improved K‐means clustering algorithm and implemented a traffic flow prediction framework based on adaptive speed thresholds [25]. Shen et al. presented a novel traffic speed prediction method based on temporal clustering analysis and a deformable convolution neural network that could discriminate traffic environments [26]. The aforementioned clustering targets were the observed data samples, where the spatial information of road networks was not incorporated.Song et al. measured the similarities between road segments based on adjacency, connectivity, and congestion; subsequently, they generated topological maps by road segment clustering [27]. However, the way to proceed with traffic prediction was not involved. The spatiotemporal performance trends at the network level and for individual links were mined through K‐means clustering [28], which could create new prediction error measures rather than enhance the prediction performance. In [29], a clustering procedure was used to identify time‐variant clusters of road segments with high spatial correlations; subsequently, they were fed into an ANN‐based traffic forecasting model, which is the most correlated model with ours. Nevertheless, the simple multilayer perceptron (MLP) structure will impair the capacity in capturing the temporal dependencies, especially for multistep prediction. We further propose obtaining the optimal number of clusters through the pre‐experiment on a validation set along with providing a comprehensive comparison of prediction performance between link‐, cluster‐, and network‐level inputs.METHODOLOGYIn this section, we first interpret the notations of the variables used herein; subsequently, we illustrate the devised method in detail.PreliminariesWe model the network as a directed graph G(N,L), where the node‐set N comprises road segments (also called links), and L is the set of directed edges representing the connections among the road segments. D the distance matrix of the road segments and the variable Dij denotes the length of the shortest path from the road segment xi to xj considering the driving direction.We denote vti as the speed of the road segment xi (∀xi∈N) at the t th time slot, where the time slot is divided with a fixed duration (e.g. 5 min), and the ‘speed’ is defined as the average speed of floating cars travelling through the road segment during this time interval. Vt∈RN(|N|=N the total number of road segments) is the speed vector of the underlying road network at the t th time slot, where the i th element is (Vt)i=vti.Given the historical observations {Vt−s∣s=0,1,⋯,m} (m is the look‐back time window) and network topology G, the traffic speed forecasting problem is to seek the trustworthy approximation V̂t+n of the future value Vt+n, where n is the required prediction step.Road segment clusteringThe traffic conditions of the entire road network vary significantly over both time and space [30], which is challenging for traffic prediction. However, focusing on the local part, the traffic states are relatively stable and closely dependent [31]. Hence, our major objective is to partition the given road network into clusters, with each containing a group of considerably correlated road segments.Based on the above notations, we define Pij to characterise the similarity in traffic patterns between road segments xi and xj:1Pij=∑uvui−vuj2τ,u∈τ,where τ refers to a historical observed dataset that covers a sufficient period (e.g. one week), and |τ| denotes its cardinality.However, only considering the observed data into account may yield ambiguous results for the sake of coincidences in numerical values (e.g., two irrelevant road segments may possess similar speeds). As the traffic flow at downstream and upstream roads affect each other through transfer and feedback effects [32], it is necessary to incorporate the topology information into the clustering procedure. Combined with the spatial distance, the spatiotemporal similarity between road segments i and j can be quantified as2Sij=αP∼ij+βD∼ij,where P∼ij∈[0,1] and D∼ij∈[0,1] are the normalised results of Pij and Dij, respectively; α and β are coefficients to adjust the effects of corresponding components.Once the similarity matrix of road segments is derived, our next goal is to partition the road segments into a set of disjoint clusters such that the total dependencies between road segments within the same cluster are maximised. Let Fk be the k th cluster with q road segments and M be the total number of clusters:3Fk=argminx1,⋯,xi,⋯xN∑i∑jSijqxi,xj∈Ns.t.∪Mk=1Fk=NFk∩Fw=∅,∀k,w∈1,⋯,MThis can be solved via agglomerative clustering, which generates clusters in a bottom‐up way to produce a binary tree with each data point as either a leaf node or root node [33].The algorithm is initialised by treating each road segment as a separate cluster. The second step is to calculate a distance matrix embodying the distance between any two clusters. In this study, we used the average‐linkage criterion [34], which determines the average similarity between all road segments of the two clusters as the distance for clustering. It is expressed as4Rij=∑u∑vSuvFi×Fj,xu∈Fi,xv∈FjIn the third step, the two clusters with the shortest distance are identified and merged into a new cluster. Subsequently, steps 2 and 3 are repeated until the number of clusters reaches the given number.Owing to the clustering processing, the road segments in the underlying network are classified into several groups. In the following sections, we denote {xk1,xk2,…,xkq} as the road segments in the cluster Fk, and Vt(k)={vtk1,vtk2,⋯,vtkq} as the corresponding speed observations at the t th time slot.Traffic prediction methodAs road segments within the same cluster are strongly spatial correlated, the primary objective of the traffic prediction model is to capture the temporal evolution of the traffic state. We leverage the Seq2Seq architecture [35] to model the temporal dependencies, which unites two RNN modules to overcome the disadvantage of fixed output timestamp of a single RNN. As shown in Figure 1, the encoder embeds the input sequence {Vt−m(k),⋯,Vt−1(k),Vt(k)}, and its final hidden state is fed into the decoder, which learns to predict the future traffic speed {V̂t+1(k),V̂t+2(k),⋯,V̂t+n(k)}. It is noteworthy that the data stream of our Seq2Seq model is cluster‐level Vt(k), which is vastly different from link‐level vti and network‐level Vt in previous studies [36]. The benefits of this manner will be emphasised in Section 4.1FIGUREStructure of the sequence‐to‐sequence network (a) training stage, (b) testing stageDuring training, historical ground truth series are fed to the decoder (see Figure 1(a)), whereas ground truth observations are replaced by predictions generated by the model itself during testing (see Figure 1(b)). The model performance may degrade due to the discrepancy of input distributions between the training and testing stage. Hence, scheduled sampling is integrated [37] into the model, where the encoder is fed with either the ground truth observation with probability ∈i or the prediction by the model with probability 1‐∈i at the ith iteration. In this study, we set ∈i as5∈i=1,i<Λ0,i≥Λ,where Λ is the predefined threshold.Throughout the model operation, the hidden state ht−1, which is passed to next timestamp t, together with input Vt(k) generates a new hidden state ht at each iteration. Both the encoder and decoder adopt the LSTM structure [38], and the calculation formulas are shown in Equations (6)–(11):6ft=σWf·ht−1||Vtk+bf7it=σWi·ht−1||Vtk+bi8c∼t=tanhWc·ht−1||Vtk+bc9ct=ft⊙ct−1+it⊙c∼t10ot=σWo·ht−1||Vtk+bo11ht=ot⊙tanhct,where ft,it,ot,andct represent the forget gate, input gate, output gate, and memory cell vector, respectively; [·||·] concatenates two tensors along the same dimensions; σ(y)=11+e−y and tanh(y)=ey−e−yey+e−y are two widely used non‐linear activation functions that map the input into (0,1) and (−1,1), respectively; ⊙ refers to the Hadamard product that conducts the element‐wise multiplication operation; Wf, Wi, Wc, and Wo are weight matrices that conduct a linear transformation while bf,bi,bc, and bo are the corresponding bias vectors.Equation (12) denotes the fully connected layer from the hidden state to the output. The dimensions of the weighted parameter matrixWv and intercept parameter bv are consistent with the output:12V̂t+jk=Wvht+j−1+bv,1≤j≤nTo jointly reduce the errors in the multiple‐step prediction, the loss function is defined as the mean absolute error between (V̂t+1(k),V̂t+2(k),⋯,V̂t+n(k)) and (Vt+1(k),Vt+2(k),…,Vt+n(k)), which is given by13loss=1n∑j=1n∥V̂t+jk−Vt+jk∥1where ∥·∥1 returns the L1 norm of the vector.All the parameters are updated by minimising the loss function through the stochastic mini‐batch gradient descent (BGD) algorithm in the training stage. The training steps of the proposed model is elaborated in Algorithm 1.EXPERIMENTSDataset descriptionReal‐world traffic dataset was collected from anonymous navigation users of the AMAP (https://www.amap.com/). We selected the inner fourth ring road in Beijing as the study site, which encircled the downtown area. As shown in Figure 2, the 65‐km fourth ring road is partitioned into 162 road segments of length 400 m. Subsequently, we calculated the 5‐min average speed for each road segment by utilising the instantaneous state of the trajectory points. The missing values were completed by the nearest non‐missing values using MATLAB [39]. The average traffic speeds of morning and evening peak hours on weekdays of October 2016 are shown in Figures 3(a) and (b), respectively. The entire dataset ranging from October 1, 2016, to November 27, 2016, was separated into three independent subsets, the details of which are provided in Table 1.2FIGURESketch of road segments in geographic information system map3FIGUREAverage traffic speed of the fourth ring road on weekdays in October 2016 (a) morning peak hours, (b) evening peak hours1TABLEInformation of experimental datasetsItemTraining setValidation setTesting setDateOct 1–Nov 13Nov 14–Nov20Nov 21–Nov 27Prediction time range06:00–22:00 (192 data points per day)Dataset size44 × 192 × 162 ≈ 1.4 M7 × 192 × 162 ≈ 220 K7 × 192 × 162 ≈ 220 KExperimental settingIn this subsection, we compare our prediction scheme with several prevailing methods:ALGORITHM 1: Training the modelInput Traffic speed observations of kth cluster in the training set: {V1(k),⋯,VT(k)}Input timestamp: m; Prediction timestamp: nThreshold of train steps for scheduled sampling: ΛOutput The model with learnt parameters1  Procedure train Seq2Seq model2   Initialise two null sets: U←∅, V←∅3   for all available time slots t (1≤t≤T) do4Xencoder←[Vt−m(k),…,Vt(k)],Xdecoder←[Vt+1(k),…,Vt+n−1(k)], Ylabel←[Vt+1(k),…,Vt+n(k)]5    U←(Xencoder,Xdecoder,Ylabel), V←(Xencoder,Ylabel)6   end for7   Initialise the original hidden status of the encoder, all the weight and bias parameters8   for all training steps i (1≤i≤I) do9    if i<Λ10     Randomly extract a batch of samples Ub from U11     Minimise Equation (13) to update the parameters by the BGD algorithm within Ub12    else13     Randomly extract a batch of samples Vb from V14     Minimise Equation (13) to update the parameters by the BGD algorithm within Vb15   end for16  end procedure    Historical average (HA): The HA model performs predictions based on the mean values of the same timestamp in the training set, which serves as a basic criterion.MLP: An MLP [40] with one hidden layer is established, and we set the number of hidden neurons to be twice the feature dimension.KNN: The KNN [41] calculates the weighted summation of the corresponding future values belonging to similar samples. The hyperparameter K is selected from 5 to 30 through the performances on the validation set.LSTM: In LSTM [38], the inputs are reshaped into a matrix with one axis as the time step and another axis as the number of road segments, which can address temporal correlations well.Graph attention network (GAT): The GAT [42] leverages masked self‐attentional layers to specify different weights to different neighbourhoods, which considers spatial information.Seq2Seq: It is facilitated with the same structure as that of our prediction model except for the cluster‐level input.Traffic graph convolutional (TGC)‐LSTM: The TGC‐LSTM [20] unifies the traffic graph convolution operation and LSTM structure to attain spatiotemporal correlations in the transportation system.It is noteworthy that KNN, LSTM, GAT, and Seq2Seq are applicable for both link‐wise modelling and network‐wide inputs. The performances of these models under the two scenarios were tested in the following experiments.Speed records in the past 60 min (i.e. m=11) were employed as inputs for all models (excluding HA) to forecast the traffic states in the next 10, 20, and 30 min. The prediction accuracy of all road segments was assessed via three typical metrics, that is, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE), expressed as follows:14MAE=∑i=1N∑t∈Hv̂ti−vtiN×H15MAPE=∑i=1N∑t∈Hv̂ti−vtivtiN×H16RMSE=∑i=1N∑t∈Hv̂ti−vti2HN,where vti and v̂ti are the ground truth and prediction values at the ith road segment at the tth time slot, respectively; H denotes the testing set.To ensure fairness in both accuracy and efficiency comparisons, the same training hyperparameters were retained in the neural‐network‐based methods (except TGC‐LSTM). In detail, the batch size was 512, the learning rate 10−2, and the training steps 2500 with the Adam optimiser. We accommodate the parameters of TGC‐LSTM to the underlying dataset on the basis of suggestions in [20]. The LSTM, encoder, and decoder of Seq2Seq comprised of one hidden layer and eight hidden units for the link‐level input and 160 hidden units for the network‐level input. The threshold of scheduled sampling for the Seq2Seq model was Λ=1700; 1‐hop and 2‐hop neighbours were considered separately in the GAT. The above settings are determined through pre‐experiment on validation set which achieves a balance between prediction accuracy and computation efficiency.For our prediction paradigm, we set α=β=0.5 and used the historical observations from October 10 to 16 for clustering (i.e. we calculated Pij using Equation 1). We exploit the data from one week for clustering because traffic patterns in both weekdays and weekends are appropriately considered. The optimal number of clusters was selected as 12 via the pre‐experiment on the validation set of the corresponding prediction performances from (4, 8, 12, 16, 20), (the details will be provided in Section 4.4). The model settings remained the same as those mentioned above except that 16 hidden units were used.Our experimental platform was a server with two central processing units (CPUs) (Intel(R) Xeon(R) CPU E5‐2673 v3 @2.40 GHz, 24 cores), 256‐GB RAM (Random Access Memory), and four Graphics Processing Units (GPUs) (NVIDIA Quadro P5000, 16 GB memory). We used python 3.6.3 with scikit‐learn [43] and tensorflow [44] to achieve contrasting models. All the algorithms were coded in parallel.Prediction performanceTables 2–4 present the comparisons of the proposed model and benchmark algorithms for 10‐, 20‐, and 30‐min ahead forecasting on the testing datasets. The following conclusions were obtained from the results:Our prediction strategy, that is, implementing the Seq2Seq model cluster by cluster achieved the best prediction results for all three evaluation metrics under various steps and exhibited superior computational efficiency. Although the specialised spatiotemporal deep‐learning framework, TGC‐LSTM, presents better performances than other baselines, it is also inferior to our approach.Figure 4 compares the execution time and MAE of the KNN, LSTM, GAT, and Seq2Seq fed with link‐ and network‐level inputs under a 20‐min time period. As expected, link‐wise modelling requires a lengthy calculation time while affording better precision, whereas the network‐level input performs worse as the heterogeneous traffic states of the entire network cause the underfitting problem.Figure 5 shows the execution time and MAE of the Seq2Seq model with link‐, network‐, and cluster‐level inputs during a 10–30 min interval. It is clear that the clustering pre‐process improved the accuracy and efficiency significantly. Compared with the link‐level input, the cluster‐level input incorporates spatial features and reduces the required number of training rounds from the number of links to the number of clusters (e.g. for the road network with 160 links, the link‐level input needs 160 training rounds link by link. When we divide them into 20 clusters, only 20 training rounds are required). Unlike the network‐level input, the cluster algorithm partitions the network into highly correlated groups, which enables the spatiotemporal dependencies to be learned more easily. It is noteworthy that the running time of the Seq2Seq model increases rapidly with the input dimension. The cluster‐level input decreases the input dimension at each iteration, and the parallel computing for each cluster contributes to reduced time consumption. The results serve as a powerful support to our motivation of this study.2TABLEPerformance comparison under 10‐min prediction intervalModelMAPEMAERMSETime(s)HA31.89%9.1112.96‐MLP11.83%4.427.35128KNN(link/network)13.35%/21.63%4.71/6.697.52/10.55109/102LSTM(link/network)11.40%/12.71%4.26/4.587.17/6.92332/182GAT1‐hop (link/network)11.39%/11.99%4.27/4.446.94/7.31162/47GAT2‐hop (link/network)11.29%/11.83%4.22/4.406.78/7.20176/47Seq2Seq(link/network)11.39%/12.98%4.26/4.577.19/6.91358/189TGC‐LSTM11.64%4.226.67483Seq2Seq(cluster)10.62%3.996.3356Abbreviations: MAPE, mean absolute percentage error; MAE, mean absolute error; RMSE, root mean squared error; HA, historical average; KNN, K‐nearest neighbour; LSTM, long short‐term memory network; GAT, graph attention; Seq2Seq, sequence‐to‐sequence; TGC, traffic graph convolution.3TABLEPerformance comparison under 20‐min prediction intervalModelMAPEMAERMSETime (s)HA31.89%9.1112.96–MLP16.85%5.839.92128KNN(link/network)19.03%/23.29%6.25/7.169.80/11.18112/92LSTM(link/network)16.61%/17.78%5.70/5.989.80/9.05334//179GAT1‐hop(link/network)16.54%//17.12%5.69/5.929.56/9.96162/46GAT2‐hop(link/network)16.44%/16.82%5.63/5.879.37/9.74178/47Seq2Seq(link/network)16.42%/17.58%5.67/5.809.83/8.96401/221TGC‐LSTM16.28%5.498.89505Seq2Seq(cluster)15.01%5.238.68684TABLEPerformance comparison under 30‐min prediction intervalModelMAPEMAERMSETime (s)HA31.89%9.1112.96‐MLP21.11%6.9411.57129KNN(link/network)23.42%/24.81%7.43/7.5911.25/11.74111/98LSTM(link/network)20.95%/21.61%6.79/7.1711.44/10.76336/185GAT1‐hop(link/network)20.88%/21.90%6.79/7.1811.27/11.81162/46GAT2‐hop(link/network)20.67%/21.35%6.72/7.1111.08/11.53178/45Seq2Seq(link/network)20.70%/21.00%6.77/6.6311.54/10.47436/253TGC‐LSTM19.53%6.4210.46491Seq2Seq(cluster)18.97%6.2110.36784FIGUREComparison of link‐ and network‐level inputs5FIGUREComparison of sequence‐to‐sequence model with link‐, network‐, and cluster‐level inputsClustering resultFigure 6 shows the change of MAE with the number of clusters in (4, 8, 12, 16, 20) on the validation and testing set under 20‐min prediction step, respectively. The variation in errors exhibits the same trend between validation and testing, indicating that it is appropriate to select the optimal number of clusters through pre‐experiment on the validation set. Meanwhile, the results are insensitive to the number of clusters, which verifies the robustness of our prediction scheme. Figure 7 presents the partition results of road segments with 12 clusters, for example, the road segments 6‐13 shown in Figure 2 are divided into cluster 2 (shown in orange) in Figure 7(a).6FIGUREMean absolute error of different cluster numbers on validation and testing under 20‐min interval7FIGUREPartition results of 162 road segments in 12 clusters (a) spatiotemporal clustering, (b) clustering without spatial informationComparing Figures 7(a) and (b), the distribution of clusters without spatial information is disorderly (e.g. the road segments in the north and south fourth ring road are both classified into cluster 7, which unlikely owns the physical interactions), which is clearly not the desirable partition for traffic prediction.Discussion of cluster‐level inputTo further validate the effect of our prediction strategy, an additional experiment was conducted to extract road segments randomly to form the same number of groups as that of our clustering process. As shown in Table 5, feeding the model with random sets of road segments does not improve the prediction accuracy, which proves the necessity of our designed clustering algorithm. The results of this experiment show that the improvements in the cluster‐level input are owing to the highly relevant spatial information within each cluster. A simple modification of the input dimension through feeding the random link sets cannot boost the model performance.5TABLEContrast of prediction performance for clustering and random selectionIntervalModelMAPEMAERMSE10‐minSeq2Seq(cluster)10.62%3.996.33Seq2Seq(random)11.82%4.327.0420‐minSeq2Seq(cluster)15.01%5.238.68Seq2Seq(random)16.55%5.659.3530‐minSeq2Seq(cluster)18.97%6.2110.36Seq2Seq(random)20.40%6.5910.75Analysis of spatiotemporal similarity measurementFigure 8 depicts the performances of the proposed prediction scheme with different coefficients α,β in Equation (2). The results explicate that the clustering with a relative trade‐off between pattern similarity (P∼ij) and spatial distance (D∼ij) is better for the prediction accuracy. The partition of road segments without the consideration of pattern similarity (α=0) or spatial distance (β=0) will compromise the prediction performance. Moreover, Table 6 assesses the effects of clustering with our spatiotemporal similarity measurement Sij and Pearson product‐moment correlation coefficient [45, 46] on prediction performances. The results prove that the devised spatiotemporal similarity measurement is more beneficial for promoting traffic prediction accuracy.8FIGURESensitivity analysis of spatiotemporal similarity measurement6TABLECluster‐level traffic prediction accuracy with different similarity measurementsIntervalSimilarity measurementMAPEMAERMSE10‐minSij10.62%3.996.33PPMCC10.99%4.076.4020‐minSij15.01%5.238.68PPMCC15.23%5.338.7330‐minSij18.97%6.2110.36PPMCC19.28%6.3310.44Abbreviation: PPMCC, Pearson product‐moment correlation coefficient.CONCLUSIONA novel high‐performance traffic prediction paradigm was developed in this study. In the first step, road segments in the underlying road network were partitioned into several groups through a spatiotemporal clustering algorithm. Subsequently, strongly correlated and mutually influencing road segments were introduced into the same cluster. Next, the Seq2Seq model was employed cluster by cluster to perform traffic prediction. Using a real‐world traffic dataset, our approach demonstrated significant improvements over other benchmarks in terms of both prediction accuracy and computational efficiency. Furthermore, we analysed the differences between link‐ and network‐level inputs, the influences of the cluster number, the effects of randomly selected road segment groups, and the designed spatiotemporal similarity measurement. In future studies, we may extend our prediction scheme for application to large‐scale urban road networks, where the influences of traffic lights may be considered into the clustering process.ACKNOWLEDGMENTSThe research is supported by grants from the National Key R&D Program of China (2018YFB1601600). The research is supported in part by Tsinghua‐Toyota Joint Research Institution.REFERENCESLin, Y., Wang, P., Ma, M.: Intelligent Transportation System (Its): Concept, Challenge and Opportunity. In: 2017 IEEE 3rd International Conference on Big Data Security on Cloud (Bigdatasecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), Beijing, China (2017)Ahmed, A., et al.: Real‐time dynamic traffic control based on traffic‐state estimation. Transp. Res. Rec. 2673(5), 584–595 (2019)Goatin, P., Göttlich, S., Kolb, O.: Speed limit and ramp meter control for traffic flow networks. Eng. Optim. 48(7), 1121–1144 (2016)Yuan, J., et al.: T‐drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Trans. Knowl. Data Eng. 25(1), 220–232 (2013)Wang, H., et al.: A novel work zone short‐term vehicle‐type specific traffic speed prediction model through the hybrid EMD–ARIMA framework. Transportmetrica B Transp. Dyn. 4(3), 159–186 (2016)Wang, Y., Papageorgiou, M.: Real‐time freeway traffic state estimation based on extended kalman filter: a general approach. Transp. Res. Part B Methodol. 39(2), 141–167 (2005)Wang, Y., Papageorgiou, M., Messmer, A.: Real‐time freeway traffic state estimation based on extended Kalman filter: Adaptive capabilities and real data testing. Transp. Res. Part A Policy Pract. 42(10), 1340–1358 (2008)Guo, J., et al.: Short‐term traffic flow prediction using fuzzy information granulation approach under different time intervals. IET Intel. Transport Syst. 12(2), 143–150 (2017)Chan, K.Y., et al.: Neural‐network‐based models for short‐term traffic flow forecasting using a hybrid exponential smoothing and Levenberg–Marquardt algorithm. IEEE Trans. Intell. Transp. Syst. 13(2), 644–654 (2011)Wang, J., Shi, Q.: Short‐term traffic speed forecasting hybrid model based on chaos–wavelet analysis‐support vector machine theory. Transp. Res. Part C Emerging Technol. 27, 219–232 (2013)Sun, B., et al.: Short‐term traffic forecasting using self‐adjusting K‐nearest neighbours. IET Intel. Transport Syst. 12(1), 41–48 (2017)Zhao, Z., et al.: LSTM network: A deep learning approach for short‐term traffic forecast. IET Intel. Transport Syst. 11(2), 68–75 (2017)Lv, Y., et al.: Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 16(2), 865–873 (2014)Huang, W., et al.: Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans. Intell. Transp. Syst. 15(5), 2191–2201 (2014)Chandra, S.R., Al‐Deek, H.: Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 13(2), 53–72 (2009)Pan, T., et al.: Short‐term traffic state prediction based on temporal–spatial correlation. IEEE Trans. Intell. Transp. Syst. 14(3), 1242–1254 (2013)Ma, X., et al.: Learning traffic as images: a deep convolutional neural network for large‐scale transportation network speed prediction. Sensors 17(4), 818 (2017)Ke, J., et al.: Short‐term forecasting of passenger demand under on‐demand ride services: a spatio‐temporal deep learning approach. Transp. Res. Part C Emerging Technol. 85, 591–608 (2017)Yu, B., Yin, H., Zhu, Z.: Spatio‐temporal graph convolutional networks: A deep learning framework for traffic forecasting. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm (2018)</bib>Cui, Z., et al.: Traffic graph convolutional recurrent neural network: a deep learning framework for network‐scale traffic learning and forecasting. IEEE Trans. Intell. Transp. Syst. 21(11), 4883–4894 (2020).Zhang, Z., et al.: Multistep speed prediction on traffic networks: a deep learning approach considering spatio‐temporal dependencies. Transp. Res. Part C Emerging Technol. 105, 297–322 (2019)Kehagias, D., Salamanis, A., Tzovaras, D.: Speed pattern recognition technique for short‐term traffic forecasting based on traffic dynamics. IET Intel. Transport Syst. 9(6), 646–653 (2015)Kim, Y.‐J., Hong, J.‐S.: Urban traffic flow prediction system using a multifactor pattern recognition model. IEEE Trans. Intell. Transp. Syst. 16(5), 2744–2755 (2015)Tang, J., et al.: An improved fuzzy neural network for traffic speed prediction considering periodic characteristic. IEEE Trans. Intell. Transp. Syst. 18(9), 2340–2350 (2017)Xiangxue, W., Lunhui, X.: Wavelet‐based short‐term forecasting with improved threshold recognition for urban expressway traffic conditions. IET Intel. Transport Syst. 12(6), 463–473 (2018)Shen, G., et al.: Research on traffic speed prediction by temporal clustering analysis and convolutional neural network with deformable kernels (May, 2018). IEEE Access 6, 51756–51765 (2018)Song, Q., et al.: An urban topological map generation method for traffic flow prediction based on road segment clustering with floating vehicle trajectory dataset. In: 19th COTA International Conference of Transportation Professionals (CICTP 2019), Nanjing , China (2019)Asif, M.T., et al.: Spatiotemporal patterns in large‐scale traffic speed prediction. IEEE Trans. Intell. Transp. Syst. 15(2), 794–804 (2013)Zhang, B., et al.: Traffic clustering and online traffic prediction in vehicle networks: a social influence perspective. In: 2012 Proceedings IEEE INFOCOM, Orlando, Florida (2012)Stathopoulos, A., Karlaftis, M.: Temporal and spatial variations of real‐time traffic data in urban areas. Transp. Res. Rec. 1768(1), 135–140 (2001)Zou, H., et al.: A spatial analysis approach for describing spatial pattern of urban traffic state. In: 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal (2010)Dong, C.‐J., et al.: Spatial and temporal characteristics for congested traffic on urban expressway. J. Beijing Univ. Technol. 38(8), 1242–1246 (2012)Xia, J., Huang, W., Guo, J.: A clustering approach to online freeway traffic state identification using its data. KSCE J. Civ. Eng. 16(3), 426–432 (2012)Lukasová, A.: Hierarchical agglomerative clustering procedure. Pattern Recognit. 11(5‐6), 365–381 (1979)Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, Montreal, Canada (2014)Liao, B., et al.: Deep sequence learning with auxiliary information for traffic prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK (2018)Bengio, S., et al.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, Montreal, Canada (2015)Hochreiter, S., Schmidhuber, J.: Long short‐term memory. Neural Comput. 9(8), 1735–1780 (1997)Folch‐Fortuny, A., Arteaga, F., Ferrer, A.: Missing data imputation toolbox for matlab. Chemom. Intell. Lab. Syst. 154, 93–100 (2016)Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back‐propagating errors. Nature 323(6088), 533–536 (1986)Dudani, S.A.: The distance‐weighted K‐nearest‐neighbor rule. IEEE Trans. Syst. Man Cybern. SMC‐6(4), 325–327 (1976)Zhang, K., et al.: Graph attention temporal convolutional network for traffic speed forecasting on road networks. Transportmetrica B Transp. Dyn. 1–19 (2020).Pedregosa, F., et al.:Scikit‐learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Abadi, M., et al.: Tensorflow: a system for large‐scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, Georgia (2016)Cheng, T., Haworth, J., Wang, J.: Spatio‐temporal autocorrelation of road network data. J. Geog. Syst. 14(4), 389–413 (2012)Diamantopoulos, T., et al.: Investigating the effect of global metrics in travel time forecasting. In: 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), Hague, The Netherlands (2013)

Journal

IET Intelligent Transport SystemsWiley

Published: Feb 1, 2021

Keywords: Data handling techniques; Traffic engineering computing; Neural nets

There are no references for this article.