Access the full text.
Sign up today, get DeepDyve free for 14 days.
Raghunandan Keshavan, A. Montanari, Sewoong Oh (2009)
Matrix completion from a few entries2009 IEEE International Symposium on Information Theory
Guangcan Liu, Zhouchen Lin, Yong Yu (2010)
Robust Subspace Segmentation by Low-Rank Representation
G. Liu, Z. Lin, Y. Yu (2010)
International Conference on Machine Learning (ICML)
Xu Luo, Jun Yang (2019)
A Survey on Pollution Monitoring Using Sensor Networks in Environment ProtectionJ. Sensors, 2019
Jian-Feng Cai, E. Candès, Zuowei Shen (2008)
A Singular Value Thresholding Algorithm for Matrix CompletionSIAM J. Optim., 20
R. Tibshirani, M. Saunders, Saharon Rosset, Ji Zhu, K. Knight (2005)
Sparsity and smoothness via the fused lassoJournal of the Royal Statistical Society: Series B (Statistical Methodology), 67
I. Markovsky (2018)
Low Rank Approximation - Algorithms, Implementation, Applications
Wei Dai, O. Milenkovic, E. Kerman (2010)
Subspace Evolution and Transfer (SET) for Low-Rank Matrix CompletionIEEE Transactions on Signal Processing, 59
Qiang Zhang, Yi Liu, Rick Blum, J. Han, D. Tao (2018)
Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A reviewInf. Fusion, 40
Michael Tipping, Charles Bishop (1999)
Probabilistic Principal Component AnalysisJournal of the Royal Statistical Society: Series B (Statistical Methodology), 61
Daxin Tian, Jianshan Zhou, Zhengguo Sheng, Min Chen, Q. Ni, Victor Leung (2017)
Self-Organized Relay Selection for Cooperative Transmission in Vehicular Ad-Hoc NetworksIEEE Transactions on Vehicular Technology, 66
Xinxin Feng, Xianyao Ling, Haifeng Zheng, Zhonghui Chen, Yiwen Xu (2019)
Adaptive Multi-Kernel SVM With Spatial–Temporal Correlation for Short-Term Traffic Flow PredictionIEEE Transactions on Intelligent Transportation Systems, 20
Yuankai Wu, Huachun Tan, Lingqiao Qin, B. Ran, Zhuxi Jiang (2018)
A hybrid deep learning based traffic flow prediction method and its understandingTransportation Research Part C-emerging Technologies, 90
Xiaobo Chen, Zhongjie Wei, Xiang Liu, Yingfeng Cai, Zuoyong Li, Feng Zhao (2017)
Spatiotemporal variable and parameter selection using sparse hybrid genetic algorithm for traffic flow forecastingInternational Journal of Distributed Sensor Networks, 13
Jian Wu, Matthias Poloczek, A. Wilson, P. Frazier (2017)
Bayesian Optimization with Gradients
S. Carrese, E. Cipriani, L. Mannini, Marialisa Nigro (2017)
Dynamic demand estimation and prediction for traffic urban networks adopting new data sourcesTransportation Research Part C-emerging Technologies, 81
(2017)
Robust temporal low-rank representation for traffic data recovery via fused lasso
Gustavo Batista, M. Monard (2003)
An analysis of four missing data treatment methods for supervised learningApplied Artificial Intelligence, 17
Li-Yu Liu, Yuqing Xu, Yang Zhou (2019)
A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problemsJournal of Global Optimization, 76
Xinyu Chen, Zhaocheng He, Lijun Sun (2019)
A Bayesian tensor decomposition approach for spatiotemporal traffic data imputationTransportation Research Part C: Emerging Technologies
Xiang-long Luo, Xue Meng, Wenjuan Gan, Yonghong Chen (2019)
Traffic Data Imputation Algorithm Based on Improved Low-Rank Matrix DecompositionJ. Sensors, 2019
S. Tak, Soomin Woo, H. Yeo (2016)
Data-Driven Imputation Method for Traffic Data in Sectional Units of Road LinksIEEE Transactions on Intelligent Transportation Systems, 17
Fu Xiao, Lei Chen, Hai Zhu, Richang Hong, Ruchuan Wang (2019)
Anomaly-Tolerant Network Traffic Estimation via Noise-Immune Temporal Matrix Completion ModelIEEE Journal on Selected Areas in Communications, 37
Qiang Shang, Zhaosheng Yang, Song Gao, Derong Tan (2018)
An Imputation Method for Missing Traffic Data Based on FCM Optimized by PSO-SVRJournal of Advanced Transportation, 2018
Risheng Liu, Zhouchen Lin, Zhixun Su (2013)
Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learningMachine Learning, 99
J. Wu (2017)
Advances in Neural Information Processing Systems (NIPS)
E. Candès, B. Recht (2008)
Exact Matrix Completion via Convex OptimizationFoundations of Computational Mathematics, 9
R. Lorenz, I. Violante, R. Monti, G. Montana, A. Hampshire, R. Leech (2017)
Dissociating frontoparietal brain networks with neuroadaptive Bayesian optimizationNature Communications, 9
(2018)
Sparse representation based multi - sensor image fusion for multi - focus and multimodality images : A review
Yoon-Young Choi, Heeseung Shon, Young-Ji Byon, Dong‐Kyu Kim, Seungmo Kang (2019)
Enhanced Application of Principal Component Analysis in Machine Learning for Imputation of Missing Traffic DataApplied Sciences
W. Zhou (2019)
11th International Conference on Wireless Communications and Signal Processing (WCSP)
Xiaobo Chen, Yingfeng Cai, Qingchao Liu, Lei Chen (2018)
Nonconvex $l_p$ -Norm Regularized Sparse Self-Representation for Traffic Sensor Data RecoveryIEEE Access, 6
(2018)
Nonconvex lp - norm regularized sparse self - representation for traffic sensor data decovery
Kun Xie, Lele Wang, Xin Wang, Gaogang Xie, Jigang Wen (2018)
Low Cost and High Accuracy Data Gathering in WSNs with Matrix CompletionIEEE Transactions on Mobile Computing, 17
Jicong Fan, T. Chow (2017)
Matrix completion by least-square, low-rank, and sparse self-representationsPattern Recognit., 71
Weiming Zhou, Haifeng Zheng, Xinxin Feng, Dong Lin (2019)
A Multi-source Based Coupled Tensors Completion Algorithm for Incomplete Traffic Data Imputation2019 11th International Conference on Wireless Communications and Signal Processing (WCSP)
Daming Li, Lianbing Deng, Zhiming Cai, Bill Franks, Xiang Yao (2018)
Intelligent Transportation System in Macao Based on Deep Self-Coding LearningIEEE Transactions on Industrial Informatics, 14
INTRODUCTIONWith the increase of car ownership year by year, urban roads have become more congested and the traffic efficiency is lower. In this context, Intelligent Transportation Systems (ITS) play a huge role and have received more and more research attention. [1, 2] Complete and accurate historical traffic data is the basis of ITS, but in actual conditions, the data collected by the traffic sensors suffers missing or corrupted value problem is almost inevitable. For example, in Melbourne, Australia, about 8% of the traffic flow sensor has a data loss rate as high as more than 56%, while in Beijing, China, about 10% of traffic data is incorrect. [3] There are many reasons for the above matter, mainly including storage device failure, transmission failure, data abnormality, and so forth. [4] However, most ITS must take complete and precise traffic data as input, such as short‐term traffic flow predictions, [5, 6] so how to recover missing data with high accuracy has become a great challenge.In order to address the problem of missing data mentioned above, scholars have proposed solutions from different perspectives based on the study of the characteristics of traffic data, among which the most representative is the Mean Imputation (MI), [7] K‐nearest Neighbour Regression (KNNR), [8] Probabilistic Principal Component Analysis (PPCA), [9] Singular Value Threshold algorithm (SVT), [10] and so on. These methods have achieved good results, and some algorithms even have landmark significance. However, some of these proposed methods could not make full use of the spatial‐temporal correlation of traffic flow data, moreover, some did not consider the impact of noise on missing value estimation. Considering the low‐rank property of traffic flow data, the current advanced Low‐rank Representation theory (LRR) can be used for modelling. In addition, low‐rank representation also has the advantage of noise robust. Further, through an in‐depth analysis of real traffic flow data, it is found that the traffic flow data has temporal stability property, that is, the traffic flow value changes between two neighbouring moments very small or even unchanged, so fused lasso [11] can be adopted to increase the precision of recovery.Inspired by the above discussion, in this article we propose a novel noise robust traffic flow estimation model based on LRR. The contributions of this article are summarized as follows:By analysing a large amount of real traffic flow data, the hypothesis that the traffic flow matrix has a low rank is verified, and further analysis finds that the traffic flow data has temporal stability property.Based on the theory of LRR, and taking full advantage of the low‐rank and temporal property of the traffic flow matrix, a noise robust completion model for missing traffic data with a temporal regularisation constraint term is proposed, named RTLRR (Robust Temporal Low‐rank Representation).An optimized algorithm for solving the proposed model is designed based on the ADMM algorithm. This algorithm has the advantage of fast convergence.Based on the pre‐processed raw dataset, the superiority of the proposed model compared to other methods is verified through comprehensive simulation experiments.The organisational structure of the article is as follows: In Section 2, related works will be introduced. Section 3 will show our verification of prior hypothesis information on traffic flow data. In Section 4, we will propose a novel traffic flow missing data recovery model and give an optimisation algorithm. Then, in Section 5, we will compare the estimation results of the proposed model with other methods on outiler‐free and raw datasets by simulation. Finally, the research work of this article will be summarized in Section 6.RELATED WORKSo far, the methods for recovering missing data can be roughly divided into the following three categories.Methods based on statistical learningStatistical learning methods include simple interpolation methods, regression methods, and probabilistic model methods. The typical representative of naive interpolation method is MI. MI is to replace the missing values with the average of the observed samples. Tak et al. used such a method. [7] The advantage of MI is particularly simple, but because all missing values of the same sample use the same mean. Instead, the original distribution of the sample is seriously distorted, and the performance is often poor.The typical representative of naive interpolation method is MI. MI is to replace the missing values with the average of the observed samples. Tak et al. used such a method. [7] The advantage of MI is particularly simple, but because all missing values of the same sample use the same mean. Instead, the original distribution of the sample is seriously distorted, and the performance is often poor.The goal of regression is to establish a mapping relationship model between missing values and estimated values. Using such mapping relationships can easily estimate missing values. Regression can be divided into linear regression and non‐linear regression. The representative of linear regression model is Least Square Regression (LSR). Non‐linear regression mainly includes KNNR , Support Vector Regression (SVR) [12] and so on. The KNN algorithm finds the k samples closest to it based on the complete variables of missing samples in the data, which has been proved to have achieved good results in traffic flow data recovery. For example, Batista et al. proposed a data‐driven improved KNN based regression algorithm called KNNR to estimate missing traffic values which achieved good results. [8] However, KNN recovers missing data in isolation, so it does not capture the global characteristics of the data. In addition, KNN has the disadvantage of being sensitive to noise. Shang et al. [12] proposed a traffic data estimation model based on Fuzzy Clustering Method (FCM) and SVR, and designed a solution method based on Particle Swarm Optimisation (PSO). This model not only uses the temporal characteristics of traffic flow data, but also takes into account the correlation between different sensors, so it has achieved good results. However, the model does not consider the noise factor in the flow data.The probabilistic model method is based on the assumption that the data follow a certain probability distribution, such as uniform distribution, Gaussian distribution, chi‐square distribution, and so forth, and then the missing values are estimated according to the probability model. PPCA is the representative of the probabilistic model method, Batista et al. [9] used this method for modelling. The model assumes that traffic data x is generated by the hidden variable z, x and z are mapped as the relationship x=Wz+μ, and the edge distribution p(z) and conditional probability p(x|z) of z both obey Gaussian distribution. In such a linear Gaussian model, p(x) and p(z|x) also Obey Gaussian distribution. The model optimisation goals are W and μ. Based on the prior assumptions, the model parameters can be optimized by using Maximum Likelihood Error (MLE) or Expectation Maximisation (EM). Recently, Bayesian Optimisation (BO) has also been widely used to adjust parameters. [13, 14] The disadvantage of PPCA is that it relies too much on a priori assumptions. However, in actual situations, it is difficult to determine which distribution the true data obeys. For example, traffic flow data often does not obey the normal distribution for the double peak traffic period.Methods based on matrix completionMatrix completion theory is derived from Compressed Sensing Theory (CS). [15] Candés et al. [16] proved that the incomplete matrix can be recovered with higher accuracy only if the matrix to be restored is low‐rank. For traffic flow matrix, this assumption is reasonable, because each item in the same road network are often related, which is called spatial correlation. Spatial correlation provides a basis for our LRR modelling. The traffic flow values at different times on the same road are also related, which is called temporal correlation. In general, the temporal correlation is represented by temporal stability, that is, the values at adjacent moments do not change much, and the temporal‐spatial correlation is reflected by the low‐rank property of the matrix. This will be demonstrated experimentally in the next Section 3. Commonly used matrix completion algorithms include SVT, OptSpace, [17] Subspace Evolution and Transfer (SET), [18] and so on. In, [19] Luo et al. proposed an improved low rank matrix decomposition (ILRMD), which fully utilizes the spatiotemporal correlation characteristics among traffic data, but it could not care the impact of stochastic fluctuation of traffic data on estimation results. In recent years, research interest has gradually shifted from matrix completion in two‐dimensional space to tensor completion in three‐dimensional space. [20] proposed a flow estimation model combining matrix completion and tensor completion, and designed a flow estimation model. Based on the optimisation model of CP decomposition, the experimental results show that the method has a fairly high estimation accuracy, and it still has a good performance even on holidays with complex traffic conditions. However, this method cannot concern the local properties of samples and is sensitive to noise.Methods based on sparse representationIn recent years, Sparse Representation theory (SR) [21] has been applied to the restoration of missing traffic flow data. Chen et al. [22] proposed a missing traffic value interpolation model based on lp norm regularisation called SSRp. In the theory SR, each sample can be represented by a linear combination of other samples. The goal is to find such a sparse coefficient matrix, which is different from the 1 norm constraint used in traditional sparse representation. [22] adopted constrained by the lp norm, a more sparsity was achieved. The simulation results show that the model has a high accuracy estimation performance. However, sparse representations have the inherent disadvantages of not being able to make good use of global properties and poor anti‐noise performance. What is more, SSRp does not take into account the temporal stability characteristics of traffic flow.Table 1 shows the summary of the related works mentioned in this section clearly.1TABLESummarisation of mentioned worksMethods based onStatistical learningSimple interpolation methods (MI[7]) Regression methods (KNNR[8], SVR[12]) Probabilistic model methods (PPCA[9], BO‐PPCA[13])Matrix completionSVT[10],OptSpace[17],SET[18], ILRMD[19]Sparse representationSSRp[22]RESEARCH ON REAL TRAFFIC FLOW DATAThe real traffic flow data set used in this study is derived from 20 sensor nodes in the road network formed by the intersection of the I205 Freeway and SR14 Highway in Portland, Oregon, USA (http://portal.its.pdx.edu/). The map of the road network and the locations of 20 detection points are shown in Figure 1. Among them, there are 10 detection points on the SR14 road and 10 detection points on the I205 road. The detection point on the south side of the SR14 highway detects the traffic flow from west to east, and the opposite on the north side; the I205 detection point on the west side detects the traffic flow from north to south, and the opposite on the east. The collection time was from 1 July 2018 to 30 September 2018. Excluding holidays and weekends, 63 days of traffic flow data were obtained. The sampling resolution of the traffic flow is 5 min. It can be seen that there are 288 sampling points per day, and the total data amount is 288×63×20=362,880.1FIGUREThe selected sensors of PortlandTo visually show the characteristics of traffic flow data, we select four typical figures in Figure 2 which is a visualisation of traffic data collected by 4 different sensors in the same day. The horizontal axis is time, and the vertical axis is vehicle flow. It is obvious that traffic data collected by the 20 sensors show some comm on characteristics. First, the traffic data collected by most sensors have two peak periods in the morning and evening, and the two peak periods appear basically the same, that is, traffic flow is periodic. Secondly, the change of the traffic values at most times is relatively small, but at some moments a sudden change (surge and decrease) will occur.2FIGUREIllustration of traffic flow sampled from four sensors on the same dayLow‐rank propertyAs mentioned earlier, the spatio‐temporal correlation is manifested by the low rank of the matrix. Low‐rank property of the matrix is the basis for applying LRR. In this section, we will experimentally verify that the traffic flow data does have low rank property. We will use Singular Value Decomposition (SVD) to evaluate whether the traffic matrix is low‐rank. Any traffic matrix X∈RT×N, where T is the number of time slots and N is the number of traffic flow detectors set at different location, X can be decomposed into the following form1X=UΣVTwhere U∈RT×T and V∈RN×N are two unitary matrices, and σ∈RT×N is a diagonal matrix whose diagonal elements are singular values. The singular value of X (in descending order) is diag(Σ)={σ1,σ2,…,σr,0,…,0}, where r is the rank of the matrix, obviously the rank of the matrix can use the number of non‐zero singular value to represent. The definition of the rank of the above matrix is actually the exact rank of the matrix. It is worth noting that in the real world, there is almost no absolute low rank. In addition, calculating the exact rank of a matrix is a morbid problem, because little changes of each element may lead to great changes in the rank of the matrix. [23] According to PCA theory, [24] if matrix X is low‐rank, then the first k(k ≪ T) singular values occupy the majority of all singular values. [25]In order to show the low rank of the traffic matrix on different working days in detail, we show the proportion of the first k singular values of the traffic matrix X collected over 5 working days, which is shown in Figure 3. The top 50 singular values occupy over 80% variance of the sum variance captured by all the singular values. In addition, the low rank of Tuesday, Wednesday, and Friday is better, which indicates that the traffic flow matrix has a good low rank property.3FIGURETemporal stability propertyTemporal stability propertyGenerally speaking, the difference between each pair of adjacent time slots monitored by a same traffic flow sensor node will not be too large, or even hardly changed. We call this characteristic temporal stability, in order to prove it, define the gap between each pair of adjacent time slots at a same location2Δ(t,n)=|xt,n−xt−1,n|where, t represents time and n represents the detector location. Obviously, there are 2 ⩽ t ⩽ T, 1 ⩽ n ⩽ N, and the smaller Δ(t,n) indicates that the n‐th sensor node collects data at time slot t compared to time slot t−1. In order to better reflect the overall characteristics of Δ(t,n), calculate the normalized gap3Δ¯(t,n)=xt,n−xt−1,nmaxx2,1−x1,1,…,xT,n−xT−1,nwhere max{|x2,1−x1,1|,…,|xT,n−xT−1,n|}is the maximal gap between any two adjacent time slots of the n‐th sensor, thus the smaller Δ¯(t,n) indicates that Δ(t,n)is relatively smaller.Similarly, we have analysed the temporal stability property of the traffic flow over five working days, and Figure 4 shows the Cumulative Distribution Function (CDF) of different Δ¯(t,n) from Monday to Friday, it is obvious that approximately 90% of the normalized adjacent gap is less than 0.4. Concern the proportion of equal adjacent data is more than 10%, which indicates that the gap of double adjacent traffic data collected by the same sensor is small, hence traffic flow matrix is temporal stable.4FIGURETemporal stability propertyTHE PROPOSED METHODOLOGYRTLRRLet X=[x1,x2,…,xN]∈RT×N be a complete traffic flow matrix, xi∈RT is a column vector, that is, the i‐th sample, where Trepresents quantity of the time slot, that is T = 288. In this article,in order to fully explore the value of traffic data,N is the product of the number of flow sensors and the number of sampling days, therefore, the model we proposed takes not only the spatial correlation of different sensors into account, but also the temporal correlation of each sensor, so N= 63 × 20 = 1260.In representation theory, each sample in Xcan be represented by a linear combination of samples in a dictionary A=[a1,a2,…,az]∈RT×Z, that is,4X=AWwhere W=[w1,w2,…,wN] is a coefficient matrix, and wi corresponds to xi. Therefore, there are5xi=∑j=1Zajwi(j)where wi(j) represents the j‐th element of wi. However, dictionary is often overcomplete, therefore formula (5) is prone to achieve a trivial solution. A good solution is to replace Awith X. This is Self Representation theory, [21] then formula (4) is rewritten as follows6X=XWNote that Xreplaces A, and it is easy to know that W∈RN×N.In sparse representation theory, Wis a sparse matrix. However, sparse representation has the disadvantage of not being able to capture the global properties of Xwhich means it cannot make full use of the inherent information of traffic data in our problem, what is more, sparse representation is sensitive to noise. Therefore, Liu et al. [26] proposed LRR theory which is different from SR which solves the sparse representation of each sample independently. The goal of LRR is to jointly solve the representation of the lowest rank. Wis a low‐rank matrix, so the problem can be modelled as a rank minimisation problem7minWrankWs.t.X=XWHowever, the optimisation problem of formula (7) is an NP problem. Fortunately, Candés et al. [16] proved that the above rank minimisation problem can be relaxed as a nuclear norm minimisation problem8minWW∗s.t.X=XWwhere ∥•∥∗ is the nuclear function, which is the sum of non‐zero singular values. Further, the above formula can be written in unconstrained form through penalty function [27] as9minX,W12X−XWF2+λ1W∗where λ1 is a hyperparameter.Notice that the original X is incomplete in fact, note the observed X as M∈RT×N, that is, the sampling matrix, then PΩ(X)=PΩ(M), and PΩ(•) is an orthogonal projection operator, defined as10[PΩX]ij=Xij,(i,j)∈Ω0,elsewhere, Ω is the index of element observed in X. Thus problem (9) becomes11minX,W,E12X−XWF2+λ1W∗s.t.PΩX=PΩMNote that there is a solution target X in the projection operator, which is not conducive to subsequent solutions, so equation PΩ(X)=PΩ(M) can be transformed into M=X+E, where PΩ(E)=0, so (11 ) is equivalent to the following12minX,W,E12X−XWF2+λ1W∗s.t.M=X+E,PΩE=0where E is the missing error.In Section 3.2, it was experimentally proved that the traffic flow data has temporal stability. Thus, we utilize this prior information via fused lasso, there are13minX,W,E12X−XWF2+λ1W∗+λ2RX1s.t.M=X+E,PΩE=0where R∈R(T−1)×T is a Toeplitz matrix14R=1−10⋯001−1⋯0⋮⋮⋱⋱000⋯1−1By minimizing the L1 norm regularisation term ∥RX∥1, the performance of temporarily stabilizing the elements in the normal traffic flow matrix in the time dimension can be obtained.Furthermore, sampling data probably has noise, we note noise matrix as C∈RT×N, to make our model robust, we add the L2 norm regularisation over C15minX,W,E12X−XWF2+λ1W∗+λ2RX1+λ32CF2s.t.M=X+C+E,PΩE=0In addition, the traffic flow data is non‐negative, so X≥0 must be added.Therefore, by adding this constraint, problem (15) is further improved as16minX,W,E12X−XWF2+λ1W∗+λ2RX1+λ32CF2s.t.X≥0,M=X+C+E,PΩE=0Optimisation algorithmFor the detachable convex optimisation problem, the simplest and most efficient solution is the Alternating Direction Method of Multipliers (ADMM). ADMM converts the complex original problem into several easy‐to‐solve sub‐problems. By seeking the local optimal solution of the sub‐problems, the global optimal solution of the original problem can be found. We use ADMM proposed by Lin et al. [28] to solve the optimisation problem in this which do not need to adjust optimal penalty parameters.To solve the optimisation problem of Equation (16), we first define the following two indicator functions17gX=0,X≥0+∞,elseand18fE=0,PΩE=0+∞,elseThen, the above optimisation problem can be transformed into the form by penalty function as Equation (19), where λ3 is a penalty parameter for violating the constraint term.Therefore, the optimisation process can be equivalently converted to solving the following three sub‐problems iteratively of (20), where Xk, Wk and Ek respect their own k‐th iteration result respectively.19minX,W,E12X−XWF2+λ1W∗+λ2RX1+gX+fE+λ32M−X−EF220Xk=argminX12X−XWk−1F2+λ2RX1+gX+λ32M−X−Ek−1F2Wk=argminWλ1W∗+12Xk−XkWF2Ek=argminEfE+λ32E−(M−Xk)F221LX,D,S,Y1,Y2=12X−XWF2+λ2S1+gD+λ32M−X−EF2+Y1,S−RX+Y2,D−X+μt2S−RXF2+D−XF222Xt=argminX12X−XWF2+λ32M−X−EF2+μt2St−1−RX+H1t−1F2+Dt−1−X+H2t−1F223QX=12X−XWF2+λ32M−X−EF2+μ2S−RX+H1F2+D−X+H2F2Obviously, the above sub‐problems 2 and 3 have closed‐form solutions, while sub‐problem 1 is difficult to solve due to the existence of ∥RX∥1 and g(X).For sub‐problem 1: In order to efficiently solve, we may introduce variables D and S, let D=X, S=RX, then sub‐problem 1 can be transformed into the following equivalent constraint optimisation problem24minX,D,S12X−XWF2+λ2S1+gD+λ32M−X−EF2s.t.S=RX,D=Xwhere, for simplicity, Wk−1 is abbreviated as W, Ek−1 is abbreviated as E.We apply ADMM to solve the above sub‐problems, the augmented Lagrange function can be defined as Equation (21), where Y1 and Y2 is Lagrange Multipliers, provided H1=Y1μt and H2=Y2μt, the optimisation steps areStep1: Update X: see Equation (22)Step2: Update D:25Dt=argminDgD+μt2D−(Xt−H2t−1)F2then we get26Dt=Xt−H2t−1D≥0Step3: Update S:27St=argminSλ2S1+μt2S−(RXt−H1t−1)F2this is the nearest neighbour operator of L1 norm.Step4‐Step5: Update H1, H2:28H1t=H1t−1+(St−RXt)29H2t=H2t−1+(Dt−Xt)Step6: Update μt:30μt=min(ρμt−1,μmax)The closed‐form solution analysis of update X is as Equation (23) shown (for simplicity, the superscript of the number of iterations has been ignored), where, I is an identity matrix of N× N.Make31∂QX∂X=032A=I−WI−WT+λ3I+μI+μRRT33B=λ3M−E+μH1+SRT+μ(D+H2)then we have34X=BA−1For sub‐problem 2: Assume that the SVD of Xk is Xk=UΣVT, U=[U1U2],Σ=diag(Σ1,Σ2), and V=[V1V2], then the closed solution of sub‐problem 2 is35Wk=V1(I−λ1Σ1−2)V1Twhere I1={i:αi>λ1} and I2={i:αi≤λ1}.ProofIn order to minimize λ1∥W∥∗+12∥Xk−XkW∥F2, make the first order differential36λ1∂W∗−(Xk)TXkI−W=0If W's SVD is W=UWΣWVWT, then the sub‐differential of W's nuclear norm37∂W∗={UWVWT+Z,UWTZ=0,ZVW=0,Z2≤1}Let UW=VW=V1, and then substitute (37) into (36), then38λ1V1V1T+Z−(Xk)TXkI−W=0Since39I−Wk=I−V1I−λ1Σ1−2V1T=λ1V1Σ1−2V1T+V2V2Tand40(Xk)TXk=VΣ2VT=V1Σ12V1T+V2Σ22V2TSubstitute Equations (39) and (40) into Equation (38), we get41λ1V1V1T+Z−1λ1V1V1T+V2Σ22V2T=0then we have Z=1λ1V2Σ2V2T, satisify the conditions V1TZ=ZV1=0 and ∥Z∥2≤1□For sub‐problem 3: Its closed form solution is easily known as42Ek=M−XkPΩEk=0So far, all the optimisation algorithms are proposed, the above solution process is named RTLRR‐ADMM. It is worth mentioning that in order to speed up the optimisation speed of the algorithm, we skilfully updates X(this process is called inner loop), and then the obtained Xis used to update W(this process is called outer loop), and the specific description is summarized in Algorithm 1.1ALGORITHMRTLRR‐ADMMInput:Sampling matrix M, the indexes of sampling entries Ω, Toeplitz matrix R, the maximum iteration number of inner loop K1, and the minimum iteration number of outside loop K2 and parameters λ1,λ2,λ3,ρ and μmax.Output:Estimated result Xopt;1:Initialize X0=D0=S0=H10=H20=μ0=0;2:for t = 1 to K1 do3:Update Xt according to (22);4:Update Dt according to (26);5:Update St according to (27);6:Update H1t and H2t according to (28) ~ (29);7:Update μt according to (30);8:t = t+1;9:end for10:for t= 1 to K2 do11:Update Wk according to (35);12:Update Ek according to (42);13:Update Xk by repeating Step 3 ~ 7;14:k = k+1;15:end for16:return Xopt.EXPERIMENT AND ANALYSISData preprocessingDue to factors such as travel willingness, weather environment, and traffic accidents, traffic flow tends to show significant stochastic fluctuations or even surge, [29] such phenomenon is obvious in Figure 2. Data with the above characteristics can be regarded as an outlier. We regard the raw data as ”corrupted” data which assuredly have noise and outliers, and use it to evaluate the estimation performance of each algorithm on true‐world stochastic traffic dataset. In order to reduce the impact of such outliers on the traffic flow estimation performance, we preprocess the raw data by filtering. The original data and outlier filtered data within one day is shown in Figure 5. It can be seen that filtering process only significantly suppresses the phenomenon of fluctuations and surges, that is, outliers, which maintains stochastic nature of the raw data. In this article, raw data preprocessed by the filter is seen as “filtered” data.5FIGUREThe raw data and filtered dataExperiment configurationFor the purpose of comprehensively evaluating the model proposed in this article, we compare RTLRR with some well‐known baseline methods, including MI, KNN, SVR, PPCA, ILRMD, and SSRp. Besides, in order to verify the role of the temporal regularisation term, the model without temporal regularisation is named RLRR. The parameters of several baseline methods are set according to the related work in [30], and the optimal parameters are determined by grid search.The proposed algorithm and comparison algorithm were implemented in MATLAB 2016a, and the hardware environment is Intel (R) Core (TM) i5‐8300H with 16GB RAM. In order to evaluate the estimation performance of each model on the missing value of traffic data, we used RStudio (package: http://www.stat.boogaart.de/compositions/) to design three patterns of missing data [22] as(a) Missing Completely at Random (MCAR) Occurrence of data missing is completely random, independent of the existing data or other missing data, hence appears as isolated data points generally.(b) Missing at Random (MAR) The occurrence of data missing is related to its neighbour points, which is generally reflected as continuous data points as a result.(c) MIXED Half of the missing data is MCAR and the other half is MAR.Figure 6 intuitively reflects the characteristics of these three patterns of missing data. Each column represents the traffic data sampled by one detector in one day and each row denote a time slot, the white squares represent the observed data points, and the black stand for the missing data points.6FIGUREComparison of three missing patterns (a) MCAR, (b) MAR, (c) MIXEDEvaluation criterionThe evaluation indicator used in this article is Root Mean Squared Error (RMSE), which depicts the absolute error of the estimation, and is defined as follows43RMSE=1Total∑i=1Total(xî−xi)2where, Total is the number of missing data, xi is the original data,and xî is the estimated data. Obviously, smaller RMSE indicates the better estimation performance.In addition, in order to compare the estimation performances of each algorithm under different missing rate, we define the missing ratio as δ, then the total number of missing data is δ×(N×T).Comparison of estimation performancesIn this article, eight algorithms are executed 10 times on the same data set, and the average value of the results is used as the basis for evaluation. Tables 2–4 list the estimation performances of the above algorithms under three different missing patterns of and different missing rates δ. Where note the input sampling matrix M without outliers as “filtered”, and M contains noise and outliers as “corrupted”.2TABLEEstimation error (RMSE) under MCAR missing patternδData typeMIKNNRSVRPPCAILRMDSSRpRLRRRTLRR0.1Filtered220.23108.97100.0188.2284.0380.2073.0267.33 Corrupted223.09112.36105.2094.1987.2084.3476.1870.160.2Filtered221.35112.45109.3292.2688.6285.0876.4669.78 Corrupted222.31117.64114.5699.8692.3589.4280.1772.530.3Filtered225.10117.30116.5498.6993.0992.5581.0972.09 Corrupted223.49122.47123.46104.0998.6297.3087.0776.020.4Filtered222.07121.34125.35102.1299.3297.3084.4273.63 Corrupted223.28126.11129.79110.27105.09104.6190.2177.280.5Filtered223.54127.01136.71107.22109.14105.8687.5176.46 Corrupted224.12133.70142.90118.02116.11113.2794.0381.320.6Filtered225.79135.20149.12111.23116.70111.4790.6080.26 Corrupted225.37139.73155.80121.94123.82118.3996.6286.053TABLEEstimation error (RMSE) under MAR missing patternδData typeMIKNNRSVRPPCAILRMDSSRpRLRRRTLRR0.1Filtered233.16132.47119.15101.2193.9386.3478.2269.67 Corrupted235.36137.49125.76108.2897.6890.1781.6472.360.2Filtered235.24136.63121.26104.3298.4192.3181.6471.48 Corrupted233.46141.27126.24112.63102.3298.9284.2974.090.3Filtered237.09140.26129.87109.60102.1298.0185.2875.83 Corrupted233.31146.80135.18117.39106.69105.5190.4780.320.4Filtered236.15145.43137.37112.13107.38105.3089.1278.60 Corrupted232.78152.04143.91119.27114.02113.2795.2284.520.5Filtered235.15149.02146.35116.87113.09111.5593.7380.00 Corrupted232.67158.25153.08125.40119.32120.8399.0586.920.6Filtered234.36155.24153.12121.29120.76118.9494.6284.57 Corrupted234.96162.33160.32128.19127.59126.72101.6292.394TABLEEstimation error (RMSE) under MIXED missing patternδData typeMIKNNRSVRPPCAILRMDSSRpRLRRRTLRR0.1Filtered223.12114.53103.3695.7588.2382.2575.3567.89 Corrupted225.40118.27109.55102.1293.6287.3878.4370.470.2Filtered227.76119.19113.2797.3791.5788.2979.2170.32 Corrupted225.24125.40119.16105.7996.3994.3783.0774.170.3Filtered220.45126.38116.54104.6294.2893.5184.6274.29 Corrupted225.53132.17123.29111.61100.1399.3489.5579.180.4Filtered228.22131.87125.35107.1299.3299.0587.1075.30 Corrupted236.46138.05133.54115.22106.60107.4493.2380.040.5Filtered226.69137.79141.26111.90111.24108.8089.0578.31 Corrupted230.95148.06149.70119.37119.16116.2996.1884.810.6Filtered229.64145.20151.32116.46118.37114.0892.3382.73 Corrupted234.23152.66158.51125.55125.43122.6199.3389.42With respect to estimation performance, it can be clearly seen from the tables that each algorithm tends to be worse as the data missing rate increasing. It is reasonable that a higher data missing rate means much more serious information loss. What is interesting, MI's estimation performance may not obey this law. Secondly, no matter which algorithm is used, data outliers will cause a decrease in estimation performance. In particular, the sensitivity of each algorithm to noise varies greatly. For example, ILRMD and SSRp are very sensitive to noise, while RLRR and RTLRR are robust to noise which are based on LRR. It is particularly interesting that MI is also insensitive to noise, but MI's estimation performance is the worst. What is more, MCAR missing pattern is the easiest situation to handle with, while MAR missing pattern is the toughest, MIXED missing pattern is centered.Also,RMSE of every model on corrupted dataset is higher than on filtered dataset, which indicates that outliers have a certain negative impact on estimation performance. Finally, no matter which type of data and missing type, the performance of MI among the eight algorithms is the worst, and the gap towards other algorithms is very obvious, then KNNR and SVR, PPCA and ILRMD have the similar performance, respectively. It is particularly noteworthy that as a state‐of‐art method, the performance of SSRp based on the sparse representation theory is close to RLRR based on the LRR theory, but is inferior to RTLRR, which indicates that utilizing the temporal correlation of traffic data via fused lasso can improve the estimation accuracy. As we can conclude, the proposed RTLRR achieves the best performance.In order to compare the run time of different models, Table 5 shows different run time of the 8 models under MIXED missing pattern with the missing ratio δ= 0.3. As we can see, MI performs fastest among all the methods for its simplicity, while methods which require iterative optimisation like SSRp, RLRR and RTLRR run slower obviously. Also, probabilistic based model PPCA requires more run time for its large amount of calculation.Benefit from ADMM optimisation algorithm, RTLRR we proposed runs faster than the similar model SSRp. Considering the estimation error of different methods, we may draw the conclusion that RTLRR achieves signifcant improvement in estimation performance with moderate increase in run time.5TABLERun time (s) of different models under MIXED missing patternMIKNNSVRPPCAILRMDSSRpRLRRRTLRR1.57818.43122.62030.19018.70638.01831.28233.229For the purpose of visually comparing the performance of the presence or absence of a temporal regularisation term on the estimation performance, Figure 7 shows the estimation performance of RLRR and RTLRR on the clean data set and the corrupted data set under MIXED missing pattern. It can be seen from the figure that the RMSE of RTLRR is always lower than that of RLRR regardless of filtered or corrupted dataset, which verifies the significance of the addition of temporal regularisation term. In addition, it can be seen that the temporal regularisation term could improves the robustness of the model to a certain extent.7FIGUREEstimation error of RLRR and RTLRRCONCLUSIONIn this article, we propose a novel model RTLRR, which makes full use of two inherent prior information, low‐rank property and temporal stability property of the traffic matrix, thus has achieved excellent results on real datasets. Different from the previous various modelling methods, we use the low‐rank representation theory to model the spatial correlation of traffic data, furthermore, we model the temporal correlation via fused lasso. This method can effectively estimate the missing values in noisy traffic data due to the noise robustness of LRR. In order to solve this model, an optimisation algorithm based on ADMM is developed to solve alternately, which greatly accelerates the optimsation speed. Simulation results confirm the efficiency of the proposed method. Our future work will involve graph structure of road networks into modelling in order to further improve estimation performance. We also plan to combine other machine learning models with our proposed RTLRR to further reduce estimation error.ACKNOWLEDGEMENTSThis work was supported in part by the National Natural Science Foundation of China under Grant 61812190, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20161104.REFERENCESLi, D.M., et al.: Intelligent transportation system in macao based on deep self‐coding learning. IEEE Transactions on Industrial Informatics 14, 3253–3260 (2018)Zhou, W., et al.: A multi‐source based coupled tensors completion algorithm for incomplete traffic data imputation. In: 11th International Conference on Wireless Communications and Signal Processing (WCSP), Xi'an, China, 23–25 October 2019Chen, X., et al.: Spatiotemporal variable and parameter selection using sparse hybrid genetic algorithm for traffic flow forecasting. International Journal of Distributed Sensors Networks 13(6), 1–14 (2017)Tian, D., et al.: Self‐organized relay selection for cooperative transmission in vehicular ad‐hoc networks. IEEE Trans. Veh. Technol. 66(10), 9534–9549 (2017)Wu, Y., et al.: A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies 90, 166–180 (2018)Feng, X., et al.: Adaptive multi‐kernel SVM with spatial‐temporal correlation for short‐term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 20, 2001–2013 (2018)Tak, S., Woo, S., Yeo, H.: Data‐driven imputation method for traffic data in sectional units of road links. IEEE Trans. Intell. Transp. Syst. 17(6), 1762–1771 (2016)Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5), 519–533 (2003)Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Statist. Soc. B 61(3), 611–622 (1999)Cai, J., Cand‘es, E.J., Shen, Z.: A singular value thresholding algorithm for matrix com‐pletion. SIAM J. Optim. 20, 1956 (2010)Robert, T., et al.: Sparsity and smoothness via the fused lasso. Journal of The Royal Statistical Society Series B‐statistical Methodology 67(1), 91–108 (2005)Shang, Q., Yang, Z., Gao, S., Tan, D.: An imputation method for missing traffic data based on FCM optimized by PSO‐SVR. Journal of Advanced Transportation 2018, 2935248 (2018)Wu, J., et al.: Bayesian optimization with gradients. In: Advances in Neural Information Processing Systems (NIPS), Long Beach, 4–9 December 2017Lorenz, R., et al.: Dissociating frontoparietal brain networks with neuroadaptive Bayesian optimization. Nature Communictions 9, 1227 (2018)Xiao, F., et al.: Anomaly‐tolerant network traffic estimation via noise‐immune temporal matrix completion model. IEEE J. Sel. Areas Commun. 37(6), 1192–1204 (2019)Cand‘es, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56, 2980–2998 (2010)Dai, W., Milenkovic, O., Kerman, E.: Subspace evolution and transfer (SET) for low‐rank matrix completion. IEEE Trans. Signal Process. 59, 3120–3132 (2011)Luo, X., et al.: Traffic data imputation algorithm based on improved low‐rank matrix decomposition. Journal of Sensors 2019, 1–11 (2019)Chen, X., He, Z., Sun, L.: A bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies 98, 73–84 (2019)Zhang, Q., et al.: Sparse representation based multi‐sensor image fusion for multi‐focus and multi‐modality images: A review. Inf. Fusion 40, 57–75 (2018)Chen, X., et al.: Nonconvex lp‐norm regularized sparse self‐representation for traffic sensor data decovery. IEEE Access 6, 24279–24290 (2018)Markovsky, I.: Low Rank Approximation: Algorithms, Implementation, Applications. Springer, (2011)Choi, Y., Shon, H., Byon, Y., et al.: Enhanced application of principal component analysis in machine learning for imputation of missing traffic data. Application Science 9, 2149 (2019)Xie, K., et al.: Low cost and high accuracy data gathering in WSNs with matrix completion. TEEE Transactions on Mobile Computering 17(7), 1595–1608 (2018)Liu, G., Lin, Z., Yu, Y.: Robust subspace segmentation by low‐rank representation. In: International Conference on Machine Learning (ICML), Haifa, 21–24 June 2010Liu, Q., Xu, Y., Zhou, Y.: A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems. Journal of Global Optimization 76, 745–768 (2020)Lin, Z., Liu, R.S., Su, Z.X.: Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Mach. Learn. 99(2), 287–325 (2015)Carrese, S., et al.: Dynamic demand estimation and prediction for traffic urban networks adopting new data. Transportation Research Part C: Emerging Technologies 81, 83–98 (2017)Fan, J., Chow, T.: Matrix completion by least‐square, low‐rank, and sparse self‐representations. Pattern Recognit. 71, 290–305 (2017)
IET Intelligent Transport Systems – Wiley
Published: Feb 1, 2021
Keywords: Optimisation techniques; Data handling techniques; Traffic engineering computing; Other topics in statistics
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.