A Dynamic Spatio-Temporal Deep Learning Model for Lane-Level Traffic Prediction
A Dynamic Spatio-Temporal Deep Learning Model for Lane-Level Traffic Prediction
Li, Bao;Yang, Quan;Chen, Jianjiang;Yu, Dongjin;Wang, Dongjing;Wan, Feng
2023-03-08 00:00:00
Hindawi Journal of Advanced Transportation Volume 2023, Article ID 3208535, 14 pages https://doi.org/10.1155/2023/3208535 Research Article A Dynamic Spatio-Temporal Deep Learning Model for Lane-Level Traffic Prediction 1 1 2 2 2 Bao Li , Quan Yang , Jianjiang Chen , Dongjin Yu , Dongjing Wang , and Feng Wan Zhejiang Testing & Inspection Institute for Mechanical and Electrical Products Quality Co.,Ltd., Hangzhou 310018, China School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China Correspondence should be addressed to Dongjing Wang; dongjing.wang@hdu.edu.cn Received 9 November 2022; Revised 1 February 2023; Accepted 24 February 2023; Published 8 March 2023 Academic Editor: Jing Zhao Copyright © 2023 Bao Li et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Trafc prediction aims to predict the future trafc state by mining features from history trafc information, and it is a crucial component for the intelligent transportation system. However, most existing trafc prediction methods focus on road segment prediction while ignore the fne-grainedlane-level trafc prediction. From observations, we found that diferent lanes on the same road segment have similar but not identical patterns of variation. Lane-level trafc prediction can provide more accurate prediction results for humans or autonomous driving systems to make appropriate and efcient decisions. In trafc prediction, the mining of spatial features is an important step and graph-based methods are efective methods. While most existing graph-based methods construct a static adjacent matrix, these methods are difcult to respond to spatio-temporal changes in time. In this paper, we propose a deep learning model for lane-level trafc prediction. Specifcally, we take advantage of the graph con- volutional network (GCN) with a data-driven adjacent matrix for spatial feature modeling and treat diferent lanes of the same road segment as diferent nodes. Te data-driven adjacent matrix consists of the fundamental distance-based adjacent matrix and the dynamic lane correlation matrix. Te temporal features are extracted with the gated recurrent unit (GRU). Ten, we adaptively fuse spatial and temporal features with the gating mechanism to get the fnal spatio-temporal features for lane-level trafc prediction. Extensive experiments on a real-world dataset validate the efectiveness of our model. machine learning methods such as K-nearest neighbor (KNN) 1. Introduction [8] and support vector regression (SVR) [9] are also used for Intelligent transportation systems (ITS) include driving trafc prediction. But they have higher requirements for fea- behaviour understanding [1], path fnding [2], map tures, which often requires complex feature processing. In matching [3], and trafc prediction [4]. Trafc prediction recent years, deep learning methods have become the main- refers to predict the future state of trafc by analyzing and stream method of trafc prediction due to automatic feature mining trafc information in the mining history [5]. As the modeling and efective data mining capabilities. For example, foundation and important part of ITS, accurate trafc recurrent neural network (RNN)-based methods [10] can ef- prediction can help formulate real-time control strategies, fectively model the temporal features in trafc fows. Con- which is of great importance for scientifc planning of trafc volutional neural network (CNN)-based methods [11] regard management and people’s safe and efcient travel [6]. trafc fows as image and model spatial features or temporal Early eforts in this feld use statistical learning methods for features in European space. Te road network or road sensor trafc prediction, such as diferential average moving autore- network is naturally a graph and has a typically non-Euclidean gression [7], which converts unstable sequences into stationary structure. Recently, researchers have used graph-based sequences by diference for prediction. Trafc state information methods for trafc prediction [12]. CNNs and RNNs can has signifcant nonlinear and uncertain characteristics, and only be used on Euclidean data, while graph-based methods 2 Journal of Advanced Transportation can efectively model the non-Euclidean structure of graphs for more accurate predictions. With a graph as input, graph-based methods have achieved superior performance in trafc pre- diction. Te topology of the graph is represented by an adjacent 1-19 1-20 1-21 1-22 1-23 1-24 1-25 matrix, and graph-based methods are directly afected by the Road section 1, lane 1 adjacent matrix. Time Although related work in the past has proposed many efective algorithms in the feld of trafc prediction, there are still some limitations and challenges. (1) Previous studies ignore the diferences between diferent lanes and mainly focus on road segment prediction. In reality, there is a wide demand for 1-19 1-20 1-21 1-22 1-23 1-24 1-25 lane-level trafc prediction. For example, automated vehicles or Road section 1, lane 2 human-driven vehicles can select appropriate lanes according Time to the prediction results at the lane level. Trafc congestion can thus be avoided or alleviated [13, 14]. Besides, lane-level trafc prediction can provide more refned and accurate trafc in- formation and help humans or machines make more appro- 1-19 1-20 1-21 1-22 1-23 1-24 1-25 priate and efective decisions. Tere are diferent and related Road section 2, lane 1 patterns of trafc states in diferent lanes [15– 17]. As shown in Time Figure 1, there are two lanes in both road section 1 and road section 2, and the trafc information between diferent lanes in the same road section has a similar change pattern, while there still exist some diferences in the change pattern in many details. In the road segment-level trafc prediction, the road 1-19 1-20 1-21 1-22 1-23 1-24 1-25 segment is regarded as a whole, and the prediction results are Road section 2, lane 2 too macroscopic to provide precise information for lane-level Time decisions. (2) Graph-based methods rely heavily on adjacent Figure 1: An example of lane-level trafc fow, where the hori- matrices, while most methods build static adjacent matrices, zontal axis represents the trafc fow, and the vertical axis rep- ignoring that the correlation between diferent nodes on the resents the time the trafc fow recorded. graph may be diferent in diferent situations. For example, there may be similar change patterns for two nodes that are far apart. Besides, the trafc situation of nodes may change in GCN, learns temporal features with GRU, and ob- diferent time periods. It is difcult for the static adjacent tains fused adaptive spatio-temporal features with matrix to respond timely and efectively to spatio-temporal the gating mechanism. changes. (3) Extensive experiments on a real-world dataset vali- To address the aforementioned challenges, we propose date the efectiveness of the model. a deep learning model for lane-level trafc prediction, which Te remainder of this paper is organized as follows: In is mainly composed of data-driven GCN and GRU. GCN is Section 2, we introduced the related work, which includes used to extract spatial features. To adapt GCN for lane-level general methods and deep learning methods. Section 3 trafc prediction, we treat diferent lanes at the same lo- formulates the lane-level trafc prediction task. Section 4 cation as diferent nodes on the graph. Te adjacent matrix introduces the construction of the data-driven adjacent of the graph is calculated in a data-driven manner and matrix and the architecture of our model in detail. Te consists of a traditional distance adjacent matrix and a dy- comprehensive experiment result on a read-world dataset is namic lane correlation matrix. GRU is used to extract demonstrated in Section 5. Finally, we conclude the paper temporal features. Ten, spatio-temporal features are ob- and present future work in Section 6. tained by fusing temporal and spatial features adaptively through the gating mechanism. Finally, lane-level trafc prediction is performed based on the learned spatio- 2. Related Work temporal features. Te main contribution of this paper can be summarized 2.1. General Methods. Traditional trafc prediction methods as follows: can be divided into parametric methods and nonparametric methods [10]. Parametric methods rely on the assumption of (1) A data-driven adjacent matrix is proposed, which data stationarity and provide explicit formulations for consists of a distance-based adjacent matrix and valuable interpretations of trafc characteristics. Classical a dynamic lane correlation matrix. It can respond parametric methods, such as the autoregressive integrated efectively to spatio-temporal changes in a timely moving average model (ARIMA) and its variants [7, 18, 19], manner. have been proven to be efective in many scenarios. For (2) We propose a deep learning model for lane-level example, some studies have found that ARIMA can model trafc prediction, which learns spatial features with highway time series data with high precision [20]. Some Traffic f low Traffic f low Traffic f low Traffic f low Journal of Advanced Transportation 3 ideal for solving trafc prediction problems. Li et al. [39] other parametric methods include exponential [21], multi- variate time series models [22], and Kalman fltering models treated the trafc fow as a difusion process and proposed DCRNN, which uses bidirectional random walks on the [23]. However, the dependency on stationarity makes parametric methods difcult to efectively model the un- graph and GRU to capture spatial and temporal features, certainty and irregular volatility of trafc data. Te structure respectively. Zhao et al. [10] proposed T-GCN, which stacks and parameters of the nonparametric methods are not fxed, GCN and GRU for trafc prediction. Yu et al. [40] proposed and the data requirements are not as strict as those of the STGCN to extract spatio-temporal features with complete parameterized methods. Te nonparametric methods are convolutional structures. Guo et al. [41] established a HGCN more able to deal with complex data such as noisy data and model which operates the convolution operation on both missing data [24]. Typical nonparametric methods include micro- and macrotrafc graphs. Zhu et al. [42] employed support vector regression [8], K-nearest neighbor [9], the GCN in multigraph to analyze correlations from multiple perspectives. Guo et al. [43] proposed a dynamic GCN for Bayesian network [25], the extreme gradient boost [26], and artifcial neural networks (ANN) [24, 27, 28]. Among them, trafc prediction on the basis of Laplace matrix estimation. Cao et al. [44] combined self-attention with GCN for trafc ANN can mine the latent information of trafc data and has nonlinear modeling ability, which is one of the most widely fow prediction. Although there is a lot of excellent work for used nonparametric methods. Although nonparametric trafc prediction, most of them are not suitable for lane-level methods have some achievements in the feld of trafc trafc prediction. Besides, most existing works treat the road prediction, these methods are limited in their ability to or sensor network as a static graph. We propose a deep predict lane-level trafc. Besides, both parametric and learning model for lane-level trafc prediction with a dy- nonparametric methods are mainly used to model the namic adjacent matrix driven by data. As for lane-level works, Gu et al. [20] combined LSTM and GRU for lane- temporal features and are weak in modeling the spatial features. level trafc speed prediction. Ke et al. [36] introduced a two- stream multichannel CNN model. Ma et al. [45] proposed a convolutional LSTM network for multilane short-term 2.2. Deep Learning Methods. With the rapid development of trafc forecasting. Lu et al. [46] described a mix deep high-performance data storage and processing technologies, learning model for lane-level trafc speed forecasting. Wang trafc prediction is moving from nonparametric methods to et al. [47] presented a heterogeneous graph convolution deep learning methods [10]. An important step in trafc model for lane-level trafc fow prediction. Existing lane- prediction is to extract spatio-temporal features from trafc level trafc prediction methods mostly use RNN or CNN to data. For the recurrent neural network (RNN) and its model spatial features, which has certain limitations. variants like long short-term memory (LSTM) [29] and gated recurrent unit (GRU) [30] which can efectively utilize 3. Problem Formulation temporal data, RNN-based methods [31] play an important role in mining temporal trafc features. Ma et al. [32] frst In this work, we aim to predict the trafc state of lanes in applied LSTM to solve the prediction of highway trafc a period of time on the basis of the historical trafc state speed and fow. Zhao et al. [10] utilized GRU, which has information recorded on the road sensors. Trafc state is fewer neurons than LSTM, for trafc prediction. Gu et al. a general concept that includes trafc speed, trafc fow, and [20] built a fusion system to capture temporal features. other numerical information related to the road. Specially, RNN-based methods [33, 34] have shown promising results we predict lane-level trafc fow in the experiment section. in trafc prediction feld, while they are not good at mining Defnition: Lane Network G. To describe the non- spatial features in trafc fow. In terms of spatial trafc Euclidean structure of the lane network, we defne it as features, trafc fows in nearby locations are often strongly graph G � (V, E, A). On graph G, V � v , v , . . . , v is the 1 2 N correlated [35]. For the power of handling image data, CNN set of nodes, where v represents the i-th lane and N is the has been used in trafc prediction by treating the trafc fow number of lanes. Note that we treat diferent lanes on the data as an image. Ke et al. [36] constructed a multichannel same road section as diferent nodes. E is the set of edges. CNN model for multilane trafc speed prediction. Liu et al. Te edge between lane i and lane j only exists if their [37] developed an attention-based CNN structure for trafc distance is less than a certain threshold and there exists speed prediction with the use of trafc fow, speed, and a trafc fow from v to v . To better represent the real i j occupancy. However, CNN and RNN can only be applied to situation, we consider the trafc fow between diferent lanes Euclidean data; they cannot model the topological structure in the same section of the road to be interconnected. of the road network or the road sensor network. Neither N×N A ∈ R is the adjacent matrix. CNN-based methods nor RNN-based methods are perfect 1 2 N N Let X � x , x , . . . , x ∈ R represent the trafc fow t t t for spatio-temporal feature extraction. of N lanes on each time stamp t. Suppose the trafc fow data Te road network or the road sensor network is naturally is the graph signal of G, given time t and lane network G, the a graph. Recently, researchers have applied graph neural lane-level trafc fow prediction problem in our work can be networks (GNN), especially graph convolutional networks defned as (GCN) [38], for trafc prediction, and they have superior X , . . . , X � fX , . . . , X , G, (1) performance compared to previous approaches. For the t+1 t+T t−p+1 t ability to model non-Euclidean graph structures, GNNs are 4 Journal of Advanced Transportation Table 1: Summary of notations. where f represents the learned mapping function, p is the input sequence length, and T is the predicted sequence Symbol Description length. G Lane network Te key symbols used in this paper are summarized in V, E Node set and edge set of in G Table 1. X Trafc information of all lanes at timestamp t x Trafc information of lane i at timestamp t A Data-driven adjacent matrix 4. The Proposed Approach A Distance-based adjacent matrix In view of the lack of work on lane-level trafc prediction, A Trafc information similarity matrix this paper proposes a lane-level trafc prediction model. Te p, T Sequence length for input and predict α A constant that controls the contribution of A architecture of our model is illustrated in Figure 2. Spe- c D Degree matrix cifcally, we frst establish a data-driven adjacent matrix that u, r Update gate and reset gate in GRU can respond to spatio-temporal changes based on the c, h Cell state and hidden state in GRU geographic location and historical trafc information of the W Learnable parameter matrices sensor. Te data-driven adjacent matrix is fed into the graph g Feature fusion gate convolutional network (GCN) to capture spatial features, H Learned spatial features and we model the temporal features with a gated recurrent t H Learned temporal features unit (GRU) model. Ten, we adaptively fuse the spatial and H Fused spatio-temporal features temporal features with the gating mechanism to get com- prehensive spatio-temporal features. Finally, we make multistep lane-level trafc predictions based on the spatio- temporal features. further introduce the dynamic correlation matrix N×N A ∈ R . A is flled with the Pearson correlation co- c c efcient calculated from the observed input data of the lanes. 4.1. Data-Driven Adjacent Matrix. Te graph depicts the ∗ To be specifc, A at time t is calculated with topological relationship structure between nodes through t�t i i j j x − x x − x the adjacent matrix, and the construction of the adjacent ∗ t�t −h+1 t t �������������������������������� A � , (3) ij matrix directly afects the expressive power of the graph [48]. ∗ 2 ∗ 2 t�t i i t�t j j ∗ x − x ∗ x − x t�t −h+1 t t�t −h+1 t However, most GCN-based trafc prediction works only construct a static adjacent matrix with fxed weights, without where i and j are the index of lane v and lane v , x is the i j t considering that the relationship of diferent nodes may i j value of trafc fow on v observed at time t, x and x are change in various situations. In particular, it is difcult for means of v and v , respectively. Te absolute value of A is i j ij a static adjacent matrix to respond to spatio-temporal closer to 1, the higher the correlation between v and v . i j changes in a timely manner, which makes the model Combining the basic distance-based adjacent matrix A hardly achieve accurate prediction. In our work, we propose and the dynamic correlation matrix A , we propose the data- a data-driven dynamic adjacent matrix, which is composed driven adjacent matrix A, of the basic distance-based adjacent matrix A and the dynamic node correlation matrix A . A � A + αA , (4) d c Te graphs include directed graph and undirected graph. where α is a constant that controls how much A contributes For undirected graphs such as social networks, the adjacent to A. On the one hand, A provides geographic relationships matrix is symmetric. In the road sensor network, the trafc that are fundamental and important for spatial feature ex- fows on roads have directions due to the restriction of trafc traction; on the other hand, A can implement timely ad- rules. Graph G is a directed graph, and the adjacent matrix is justments to the adjacent matrix with reference to changes of asymmetric. historical information. For the basic distance-based adjacent matrix A , as most works did [49], we calculate one element A in A with ij d 4.2. Spatial Feature Modeling. Spatial features play an im- ⎧ ⎪ −d ⎪ ij ⎪ ⎛ ⎝ ⎞ ⎠ ⎪ exp , if d ≤ ε, portant role in trafc prediction for trafc fow sequences at ij ⎪ 2 diferent locations with connection to some extent. Before A � (2) ij ⎪ the employment of graph-based methods, research studies usually extract the spatial features with multivariate time 0, if d > ε, ij series models or CNNs [50]. However, limited by the where A represents the infuence degree of lane v on lane structure, multivariate time series models mostly cannot ij i v , d is the distance between v and v , and σ is the standard model the nonlinear relationships between diferent se- j ij i j deviation of d. Te distance between diferent lanes on the quences. Although CNN-based methods can alleviate the same road segment is 0. A has a positive value only if d is situation, the architecture of CNN is bounded to Euclidean ij ij smaller than threshold ε and from v to v exists a trafc fow. space, which is not enough for lane network’s topological i j To compensate for the defects caused by the static structure modeling. Recently, graph-based methods have characteristics of the distance-based adjacent matrix, we attracted wide attention for their ability in modeling non- … Journal of Advanced Transportation 5 Data A A X A + αA c d d c Collection Sensor Trafc Correlation Distance-based Sensor lane1 lane2 lane3 lane4 information matrix adjacent matrix geolocation X X [t–h+1:t] [t–h+1:t] Prediction 1 result X t+1 1 2 Spatial GCN Sensor feature 4 4 Linear 2 2 3 5 transformation lane1 lane2 5 5 Linear Temporal transformation feature … 2 GRU GRU Prediction result X Sensor t+T t–h+1 lane1 lane2 lane3 lane4 Figure 2: Te architecture of our proposed model. to extract the temporal features of the trafc data. Tere are Euclidean structure. Specifcally, we extract the spatial features with GCN. Te GCN model built a flter in the two gates in GRU, which are spatial domain, and the spatial features between diferent u � σ W X + W h + b , t ux t uh t−1 u nodes on a graph are extracted with the usage of flter. As (6) illustrated in Figure 3, the central node models the topo- r � σ