Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
electronics Article The Fast Detection of Abnormal ETC Data Based on an Improved DTW Algorithm 1 1 , 2 , 2 , 2 2 2 3 Feng Guo , Fumin Zou *, Sijie Luo *, Lyuchao Liao , Jinshan Wu , Xiang Yu and Cheng Zhang College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China; n180310004@fzu.edu.cn Fujian Key Laboratory for Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou 350118, China; achao@fjut.edu.cn (L.L.); 2191905037@smail.fjut.edu.cn (J.W.); 2201901004@smail.fjut.edu.cn (X.Y.) College of Information Technology and Management, Hunan University of Finance and Economics, Changsha 410205, China; 201805230240@mails.hufe.edu.cn * Correspondence: fmzou@fjut.edu.cn (F.Z.); sjluo@fjut.edu.cn (S.L.); Tel.: +86-136-3842-8527 (S.L.) Abstract: As one of the largest Internet of Things systems in the world, China’s expressway electronic toll collection (ETC) generates nearly one billion pieces of transaction data every day, recording the trafﬁc trajectories of almost all vehicles on the expressway, which has great potential application value. However, there are inevitable missed transactions and false transactions in the expressway ETC system, which leads to certain false and missing rates in ETC data. In this work, a dynamic search step SegrDTW algorithm based on an improved DTW algorithm is proposed according to the characteristics of expressway ETC data with origin–destination (OD) data constraints and coupling between the gantry path and the vehicle trajectory. Through constructing the spatial window of segment retrieval, the spatial complexity of the DTW algorithm is effectively reduced, and the efﬁciency of the abnormal ETC data detection is greatly improved. In real trafﬁc data experiments, the SegrDTW algorithm only needs 3.36 s to measure the abnormal events of a single set of OD path data for 10 days. Compared with the mainstream algorithms, the SegrDTW performs best. Therefore, the proposal provides a feasible method for the abnormal event detection of expressway ETC data in Citation: Guo, F.; Zou, F.; Luo, S.; Liao, L.; Wu, J.; Yu, X.; Zhang, C. The a province and even the whole country. Fast Detection of Abnormal ETC Data Based on an Improved DTW Keywords: data governance; electronic toll collection data; trajectory similarity; dynamic time Algorithm. Electronics 2022, 11, 1981. warping; anomaly detection; expressway https://doi.org/10.3390/ electronics11131981 Academic Editor: João Soares 1. Introduction Received: 29 May 2022 China has built the world’s largest electronic toll collection (ETC) system for ex- Accepted: 22 June 2022 pressways and deployed more than 20,000 gantry devices on 160,000 km of expressways Published: 24 June 2022 nationwide. There are more than 200 million ETC OBU devices, with average daily ETC Publisher’s Note: MDPI stays neutral transaction data of nearly 1 billion [1]. The ETC transaction data record almost all vehicles’ with regard to jurisdictional claims in trafﬁc conditions on expressways and can be used for expressway trafﬁc ﬂow predic- published maps and institutional afﬁl- tion [2,3], transit time estimation [4,5], trafﬁc demand visualization [6], etc. The data are iations. expected to provide important information services for intelligent driving on expressways. However, due to equipment failure, trafﬁc jams, wireless crosstalk and other reasons, there are inevitably missed transactions and false transactions in the expressway ETC system. Obviously, when the ETC system is used to provide decision-making information Copyright: © 2022 by the authors. for intelligent driving on expressways, the error rate and missing rate of ETC data need Licensee MDPI, Basel, Switzerland. to be strictly controlled, and the error and missing ETC data must depend on the context This article is an open access article of trajectory semantic structure to be accurately detected. As the key problem of time- distributed under the terms and series data processing, trajectory outlier detection is a popular topic at home and abroad, conditions of the Creative Commons among which the dynamic time warping (DTW) algorithm [7] has been widely studied Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ and applied as one of the mainstream methods of trajectory similarity measurement. Other 4.0/). such methods include user similarity analysis [8,9], travel mode analysis [10,11], and travel Electronics 2022, 11, 1981. https://doi.org/10.3390/electronics11131981 https://www.mdpi.com/journal/electronics Electronics 2022, 11, 1981 2 of 21 route recommendation [12]. However, due to the huge amount of ETC transaction data and the lack of systematic data feature modeling, when some provincial platforms tried to use mainstream algorithms such as DTW to measure the error rate and missing rate of province-wide expressway ETC transaction data, a series of problems such as long time and low efﬁciency was encountered. Therefore, at present, the quality of ETC transaction data for national expressways is basically unknown, which seriously affects the efﬁciency of the ETC system of expressways and restricts the realization of the application of ETC transaction data. In this work, aiming at the demand of intelligent driving expressway decision-making information based on ETC transaction data, we deeply studied the characteristics of ETC transaction data and designed a SegrDTW algorithm for segmented similarity recognition according to the characteristics of the ETC transaction data, through which we could quickly detect the problems of missed transactions and false transactions in the ETC transaction data. For the characteristics of the ETC transaction data, we used end-to- end bidirectional matching to extract dynamic step length and construct DTW search space windows, which reduced the computing time of the traditional DTW algorithm. By identifying the characteristics of the missed transactions and false transactions, the abnormal data in ETC transaction data culd be detected quickly. The algorithm proposed in this paper makes it possible to measure the abnormal events in expressway ETC transaction data and solves the key technical problems of the large-scale commercial application of expressway ETC transaction data. The main contributions of this work are as follows: By improving the DTW algorithm, we proposed a SegrDTW algorithm that effectively reduces the time complexity of DTW. We also proposed an ETC data governance method that can effectively identify the false transactions and missed transactions in ETC data. The rest of the article is arranged as follows. In the second part, the characteristics of the ETC transaction data are studied, the related deﬁnitions are given, and the ETC abnormal transaction data detection modeling is established. The third part is the design and analysis of the SegrDTW algorithm. In the fourth part, the experimental veriﬁcation and result analysis are carried out using expressway ﬁeld data from Fujian Province. In the ﬁnal part, a summary of the full text is given, and the future work is prospected. 2. Modeling Abnormal ETC Transaction Data Detection ETC transaction data include a large amount of vehicle trafﬁc information and toll information, which can be used to reconstruct the time a vehicle passed through the toll stations and gantry along the way through the ﬁelds of vehicle identity document (ID), toll station equipment ID, gantry equipment ID, transaction time, etc. It is then possible to estimate the real-time position and speed of the vehicle and the trafﬁc running situation of each section of the expressway. It can provide important decision-making information services for intelligent driving, such as reminding that there are turtle-speed vehicles in front and speeding vehicles approaching behind. According to the service demand of the intelligent driving decision-making information service, in order to systematically study the characteristics of ETC transaction data, this paper gives the following deﬁnitions. Deﬁnition 1 (ETC transaction data). According to the service demand of intelligent driving assistance decision information, expressway ETC transaction data (EData) mainly includes three ﬁelds: V I D, S I D and Time, which are vehicle ID, toll station or gantry ID and transaction time. According to the V I D ﬁeld, the vehicle model, media access control address (MAC) and other information from the vehicle-mounted onboard unit (OBU) device in EData can be further determined. Similarly, according to the S I D ﬁeld, we can further determine the longitude LN G and latitude L AT of an ETC toll station or gantry. Among them, we regard the toll station as a special gantry. Electronics 2022, 11, x FOR PEER REVIEW 3 of 22 can be further determined. Similarly, according to the field, we can further determine the longitude 𝐿𝑁𝐺 and latitude 𝐿𝐴𝑇 of an ETC toll station or gantry. Among them, we Electronics 2022, 11, 1981 3 of 21 regard the toll station as a special gantry. Definition 2 (the section). The road section between two adjacent gantries on expressways is Deﬁnition defined as se2 cti (the on 𝑄𝐷 section). : The road section between two adjacent gantries on expressways is deﬁned as section QD: (1). 𝑄𝐷 = < , > 1 2 QD = hS I D , S I D i (1) 1 2 As shown in Figure 1, at present, the coverage distance of a road section usually As shown in Figure 1, at present, the coverage distance of a road section usually ranges from several kilometers to more than 10 kilometers. According to the demands of ranges from several kilometers to more than 10 kilometers. According to the demands the expressway intelligent driving assistant decision information service, the gantry of the expressway intelligent driving assistant decision information service, the gantry position deployment can be increased or adjusted in the later stage. position deployment can be increased or adjusted in the later stage. SID 1 SID 2 SID The upstream section QD QD 1 2 QD 3 QD SID The downstream sections 3 SID SID 2 1 Figure 1. A schematic diagram of the sections. Figure 1. A schematic diagram of the sections. The expressway ETC system inevitably has missed and false transactions; for example, The expressway ETC system inevitably has missed and false transactions; for when a vehicle passes through S I D on the upstream road, there can be a missed transaction example, when a vehicle passes through on the upstream road, there can be a due to the shelter of the big truck, or there can be a false transaction caused by S I D near the missed transaction due to the shelter of the big truck, or there can be a false transaction downstream road. Obviously, we cannot detect missed transactions and false transactions caused by near the downstream road. Obviously, we cannot detect missed from a single point of data, so we need to rely on the semantic structure of the trajectory to transactions and false transactions from a single point of data, so we need to rely on the achieve detection. Therefore, taking the toll station as the starting point and the end point, semantic structure of the trajectory to achieve detection. Therefore, taking the toll station we give the deﬁnitions of transaction trajectory and OD path as follows. as the starting point and the end point, we give the definitions of transaction trajectory and OD path as follows. Deﬁnition 3 (the transaction trajectory). The EData sequence of the OD path formed by the vehicle VID from the entrance toll station S I D to the exit toll station S I D is called the transaction Definition 3 (the transaction trajectory). The 𝐸𝐷𝑎𝑡𝑎 sequence of the OD path formed by the trajectory, ETraj: vehicle VID from the entrance toll station to the exit toll station is called the 𝑜 𝑑 transaction trajectory, 𝑗𝑟𝑎𝐸𝑇 : ETraj = EData , EData , . . . , EData , EData (2) h i o 1 n (2) 𝑗𝑟𝑎𝐸𝑇 = < 𝐸𝐷𝑎𝑡𝑎 , 𝐸𝐷𝑎𝑡𝑎 , … , 𝐸𝐷𝑎𝑡𝑎 , 𝐸𝐷𝑎𝑡𝑎 > 𝑜 1 𝑛 𝑑 where EData and EData are transaction data generated by S I D and S I D , respectively, and o o d d where 𝐸 𝐷𝑎𝑡𝑎 and 𝐸 𝐷𝑎𝑡𝑎 are transaction data generated by and , respectively, and EData to EData are transaction data generated by several gantry frames between OD paths. 𝑜 𝑑 𝑜 𝑑 1 n 𝐸 𝐷𝑎𝑡𝑎 to 𝐸 𝐷𝑎𝑡𝑎 are transaction data generated by several gantry frames between OD paths. 1 𝑛 Obviously, all vehicles need to enter and exit the expressway from the toll station, and generally Obvious,ly it, can all veh be consider icles neeed d to that ent ther er an e will d exbe it th missed e expressw transactions ay from atth the e to toll ll sta station. tion, and generally, it can be considered that there will be missed transactions at the toll station. Therefore, the EData is OD constrained; that is, the head and tail nodes of the transaction trajectory Therefore, E th Te ra𝐸 j 𝐷𝑎𝑡𝑎 are all is toll OD booth constrained frames, ; thand at is, ther the he e will ad an be d tai nol nod missed es of transactions the transaction or missed trajectory transactions. 𝑗𝑟𝑎𝐸𝑇 are all toll booth frames, and there will be no missed transactions or missed transactions. Deﬁnition 4 (OD path). The main topological path formed by all the entrance toll station data from the starting station S I D to the ending station S I D is called OD path OD L J: Definition 4 (OD path). The o main topological path formed by all the entrance toll station data from the starting station to the ending station is called OD path 𝑂𝐷𝐿𝐽 : 𝑜 𝑑 OD L J = hS I D , S I D , . . . , S I D , S I D i (3) o n 1 d In Figure 2, the red stations indicate the toll stations from beginning to end in an OD path, and the green stations indicate the passing entrance stations. At the same time, we call all the OD paths on the expressway OD path sets. 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 Electronics 2022, 11, x FOR PEER REVIEW 4 of 22 𝑂𝐷𝐿𝐽 = < , , … , , > (3). 𝑜 1 𝑛 𝑑 In Figure 2, the red stations indicate the toll stations from beginning to end in an OD Electronics 2022, 11, 1981 4 of 21 path, and the green stations indicate the passing entrance stations. At the same time, we call all the OD paths on the expressway OD path sets. SID9 SID8 SID7 SID6 S SID ID3 5 SID4 SID3 SID2 S SID ID1 1 SID0 Figure 2. Topology diagram of an OD expressway route. Figure 2. Topology diagram of an OD expressway route. According to the above definition, the problem of detecting missed and false According to the above deﬁnition, the problem of detecting missed and false transac- transaction data in 𝐸𝐷𝑎𝑡𝑎 can be transformed into a problem of comparing transaction tion data in EData can be transformed into a problem of comparing transaction trajectory trajectory and OD path; that is, all ETC data are normalized into a transaction trajectory and OD path; that is, all ETC data are normalized into a transaction trajectory set, and set, and then abnormal trajectories can be detected by comparing that set with the OD then abnormal trajectories can be detected by comparing that set with the OD path set. path set. Therefore, how to improve the efficiency of abnormal trajectory detection Therefore, how to improve the efﬁciency of abnormal trajectory detection becomes the key to beco identifying mes the the key of to E D iden ata err tify or ing and thmissing e of 𝐸𝐷𝑎𝑡𝑎 rates error quickly an and d mis effectively sing rate . s quickly and effect One ively. of the research directions of anomaly detection is realizing anomaly detection by measuring One of th trajectory e researchsimilariti directions es in of time anom series aly detec data. tion Mainstr is realiz eam ingalgorithms anomaly detec include tion dynamic by measuri time ng warping trajectory (DTW), similar edit ities distance in time on seri real es d sequence ata. Mainst (EDR), ream Hausdorf algorithm f s distance include (Hausdorf dynamic time f), etc. warping [13]. On (DT the W), basis edit of dithe stance algorithm, on real se Mao quenc et e al. (ED [14R), ] pr Hausdo oposedrf af Markov distance decision process-based detection method for abnormal spatial trajectories of road networks. (Hausdorff), etc. [13]. On the basis of the algorithm, Mao et al. [14] proposed a Markov The method classiﬁes all the collected vehicle trajectory data according to the time sequence decision process-based detection method for abnormal spatial trajectories of road and then divides the trajectory data according to different vehicle trajectories to detect networks. The method classifies all the collected vehicle trajectory data according to the abnormal values. However, this method is not suitable for road network conditions, which time sequence and then divides the trajectory data according to different vehicle change dynamically and are not suited to modeling dynamic incremental anomaly detec- trajectories to detect abnormal values. However, this method is not suitable for road tion. George et al. [15] proposed an improved iterative DTW method according to the network conditions, which change dynamically and are not suited to modeling dynamic abnormal signal trajectory of layered ore body modeling to solve the problem of the large incremental anomaly detection. George et al. [15] proposed an improved iterative DTW differences between the two signal lengths that greatly improved the accuracy of abnormal method according to the abnormal signal trajectory of layered ore body modeling to solve trajectory matching. However, this method also has some shortcomings in application, the problem of the large differences between the two signal lengths that greatly improved such as how to solve the sparseness problem when it is applied to trafﬁc trajectory data. the accuracy of abnormal trajectory matching. However, this method also has some Ghersi et al. [16] proposed an automatic method of gait cycle extraction and analysis based shortcomings in application, such as how to solve the sparseness problem when it is on a DTW algorithm. This method combined the previous strategy with the dynamic applied to traffic trajectory data. Ghersi et al. [16] proposed an automatic method of gait time warping function, supplemented the segmentation method based on gait events, and cycle extraction and analysis based on a DTW algorithm. This method combined the tested it on undamaged and damaged people in order to overcome the need to calibrate previous strategy with the dynamic time warping function, supplemented the or rely on the predeﬁned threshold and reduce the error of step detection. However, this segmentation method based on gait events, and tested it on undamaged and damaged method does not consider the detection of turning stride or non-horizontal plane. Kumar people in order to overcome the need to calibrate or rely on the predefined threshold and et al. [17] proposed a DTW-based method, trajDTW, that used Dijkstra algorithm to calcu- reduce the error of step detection. However, this method does not consider the detection late the similarities in road network constraint trajectories. This method is suitable for a of turning stride or non-horizontal plane. Kumar et al. [17] proposed a DTW-based large number of overlapping trajectories in dense road networks, supports the analysis of method, trajDTW, that used Dijkstra algorithm to calculate the similarities in road passengers’ movement patterns and the planning of better bus routes, and also provides network constraint trajectories. This method is suitable for a large number of overlapping references for real-time route predictions for taxi passengers. When Hollerbach et al. [18] trajectories in dense road networks, supports the analysis of passengers’ movement studied the single-valued arrangement of ion trajectories, they used the DTW algorithm patterns and the planning of better bus routes, and also provides references for real-time to correct the changes in the sharing differences between lossless ion separations. In this route predictions for taxi passengers. When Hollerbach et al. [18] studied the single- method, single-valued alignment (SVA) and nonlinear dynamic time warping (DTW) data valued arrangement of ion trajectories, they used the DTW algorithm to correct the processing methods are used to correct the changes between IMS separations. However, the changes in the sharing differences between lossless ion separations. In this method, single- effect of this method for separation in a wide mobility range is much worse. Ruble et al. [19] proposed an improved DTW algorithm combined with hyper-dimensional Bayesian time mapping for mapping spatio-temporal trajectory signals with inserted or deleted elements. This method calculates the common underlying signal between two sequences, and it is superior to the DTW algorithm in sequence mapping and classiﬁcation. However, when the signal-to-noise ratio of unconstrained sequences is low, the performance of this algorithm is poor. Cabanas et al. [20] proposed a DTW-based backtracking method to solve the problem 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 𝑆𝐼𝐷 Electronics 2022, 11, 1981 5 of 21 that the audio symbols in audio trajectories cannot be synchronized in music scores. This method adds a trajectory stage with a conﬁgurable delay. The small increment of delay will improve accuracy to a certain extent, but the algorithm has a delay and cannot be used in sensitive applications. As DTW is widely used in trajectory matching, a series of improved algorithms [21–25] was proposed by Emmon Keogh at the University of California for improving the retrieval efﬁciency of the DTW algorithm or the reliability of detection. Their team optimized the performance of the DTW algorithm by many methods. They improved the speed and accuracy of DTW optimization by indexing global and local data structures, parallel computing, time series clustering, and introducing parameters as a search mech- anism for distance measurement. In addition to the mainstream DTW algorithm, other algorithms are also applied to the ﬁeld of abnormal trajectory detection. Liu et al. [26] used the Hausdorff algorithm to establish the matching degree of the amplitude characteristics of the transient components of zero-sequence current trace at both ends of the feeder area of a distribution network. Through matching analysis, the identiﬁcation ability of the fault sections in different environments was improved. This method adopts the comprehensive evaluation analysis method of wave channel model and ideal value approximation, which provides a reference for airlines to solve the problem of the comprehensive evaluation of ﬂying quality. However, the application of the wave channel model is limited, and the accuracy of the evaluation results needs to be improved. Teng et al. [27] integrates a K-means thought and Hausdorff algorithm, constructs data segments using ﬂight trajectory data, calculates the scores of multiple ﬂight parameters in the data segment of level ﬂight according to Brin channel theory, and ﬁnally determines the weights of each parameter through an “entropy weight method” to obtain index data reﬂecting the level ﬂight qual- ity of different UAV controllers. However, the robustness of the algorithm needs to be improved due to abnormal interference. In view of the fact that the current qualitative eval- uation methods are time-consuming and laborious and that fair and objective evaluation is difﬁcult, Qian et al. [28] selected the evaluation index by analyzing the data demand of pilot handling quality evaluation. The improved G1 method is used to select the weights, and the evaluation index system of pilot handling quality is established, so as to obtain an intuitive evaluation system. However, the accuracy of the algorithm evaluation is greatly affected by the selected indicators, and when the indicators are not rigorous enough, the evaluation results are not convincing. Wang et al. [29] proposed an ordered generalized Hausdorff distance for measuring the geometric quality of point cloud data and proposed a model representation of time series according to the changing trend. The method can quickly calculate the distance of model and has no multi-resolution characteristics, which can effectively identify the abnormal degree of data. However, because the choice of resolution is very important for obtaining the similarities between sequences, the model distance of this method can only represent the difference between two sequences with the same length; it cannot reﬂect the differences between sequences with different lengths. Qiu et al. [30] proposed the OBF-EDR magnetic anomaly signal similarity measurement method based on the combination of orthogonal basis function (OBF) decomposition and EDR. In this method, the discrete basis function coefﬁcients are obtained by the orthogonal basis function decomposition of magnetic anomaly signals, and the signal-to-noise ratio of the discrete basis function coefﬁcients is improved according to the characteristic that background noise is independent of the basis function. The edit distance method is used to calculate the similarities in the discrete basis function coefﬁcients and thus indirectly realize the similarity measurement of magnetic anomaly signals. Cao et al. [31] proposed an aircraft trajectory anomaly detection method based on the combination of a multidimen- sional outlier descriptor (MOD) and bidirectional long-short memory network (Bi-LSTM). Firstly, the trajectory deviation detection is transformed into the trajectory density classiﬁ- cation problem, and a multidimensional anomaly descriptor is designed for the trajectory deviation detection model so as to effectively detect the trajectory deviation and trajectory outliers. Among them, the distance between time series is evaluated using the DTW simi- larity function instead of the traditional Euclidean distance, which solves the problem that Electronics 2022, 11, 1981 6 of 21 Euclidean distance is difﬁcult to accurately calculate. However, the corresponding time of the model is long, and it is difﬁcult to guarantee the real-time detection. Liu et al. [32] proposed a new asynchronous trajectory-matching method (PTSCTM) based on piece-wise spatio-temporal constraints, which is used for AIS trajectory reconstruction and ship abnor- mal behavior discovery. Based on the spatio-temporal constraints, this method ﬁnds the optimal trajectory matching point in the candidate matching point set with time and space distance and uses it to calculate the similarities in trajectory matching. In order to reduce the time to ﬁnd matching points after segmentation, the similarity between trajectory segments only depends on the matching degree of trajectory points in trajectory segments. When the trajectory length is short, the calculation results will deviate. However, EData has the characteristics of large scale and high density, being composed of discrete points obtained by sampling with discrete and continuous mathematical charac- teristics. Obviously, mining abnormal data directly from a large number of transactions is a heavy workload and is inefﬁcient. When processing unique EData to meet the demand of massive real-time ETC abnormal data detection, the sequential algorithm of spatio- temporal deformation trajectory detection needs to be further optimized and improved so as to quickly and accurately identify abnormal data and missed/false transactions. The mainstream algorithms, DTW, Hausdorff, EDR, etc., that are used in the ﬁeld of spatio-temporal deformation detection have a time complexity of O n , so it takes a great deal of time to process big trajectory data. In order to effectively improve the efﬁciency and accuracy of the abnormal trajectory detection of EData, a restricted search space algorithm, SegrDTW, based on the EData of expressways is proposed. It takes S I D and S I D as o d starting points, carries out the end-to-end bidirectional matching to extract the dynamic step size, cuts the matching range of DTW by constructing a segmented matching feature window, and makes an accurate spatial search for the mismatched trajectory sequence, thus greatly reducing unnecessary matrix calculation and time complexity. SegrDTW not only veriﬁes the similarities between the path sequence OD L J and the vehicle trajectory sequence ETraj but also identiﬁes abnormal situations in the different parts of the trajectory, such as missed transactions or false transactions. For the degree of abnormal deviation, we made the following deﬁnitions. Deﬁnition 5 (abnormal degree). When the VID of the vehicle passes through an OD L J path, the corresponding transaction data ETraj will be generated, and the proportion of the missed or false transaction data in the whole transaction data is called the abnormal degree. The abnormal degree of OD L J can be expressed as follows: e(OD L J) = e(QD)/n (4) i=1 where e(QD) is the abnormal degree of a single section inOD L J. e(QD) 2 [0, 1] means that the value of e(QD) is between 0 and 1 when there is a missed transaction or a false transaction in a section QD. Through the above deﬁnitions, it can be concluded that EData has the following main characteristics: (1) EData is spatio-temporal sequence data. (2) As vehicles must pass through toll stations, they are constrained by OD characteristics. (3) There are many trajec- tories on the OD path, but the main OD path has the largest number of trajectories. (4) Due to the missed and false transactions between the ETC vehicle-mounted equipment and the roadside equipment, OD path and ETC gantry topology only have semantic coupling. Therefore, faced with large-scale and dense EData, it is necessary to ﬁnd a way to adapt and process the data with these characteristics and at the same time reduce the workload of algorithm processing so as to obtain the corresponding results more directly and conveniently. Electronics 2022, 11, x FOR PEER REVIEW 7 of 22 Through the above definitions, it can be concluded that 𝐸𝐷𝑎𝑡𝑎 has the following main characteristics: (1) 𝐸𝐷𝑎𝑡𝑎 is spatio-temporal sequence data. (2) As vehicles must pass through toll stations, they are constrained by OD characteristics. (3) There are many trajectories on the OD path, but the main OD path has the largest number of trajectories. (4) Due to the missed and false transactions between the ETC vehicle-mounted equipment and the roadside equipment, OD path and ETC gantry topology only have semantic coupling. Therefore, faced with large-scale and dense 𝐸𝐷𝑎𝑡𝑎 , it is necessary to find a way to adapt and process the data with these characteristics and at the same time reduce the workload of algorithm processing so as to obtain the corresponding results more directly and conveniently. Electronics 2022, 11, 1981 7 of 21 3. An Algorithm for Detecting Abnormal Data of ETC The general framework of the abnormal ETC transaction data detection algorithm is 3. An Algorithm for Detecting Abnormal Data of ETC shown in Figure 3. According to the characteristics of 𝐸𝐷𝑎𝑡𝑎 , this paper proposes an The general framework of the abnormal ETC transaction data detection algorithm improved algorithm, SegrDTW, based on DTW. The algorithm matches the similarities of is shown in Figure 3. According to the characteristics of EData, this paper proposes an vehicle trajectories passing on 𝑂𝐷𝐿𝐽 path and quickly calculates the unmatched and improved algorithm, SegrDTW, based on DTW. The algorithm matches the similarities dissimilar trajectory segments between 𝑗𝑟𝑎𝐸𝑇 and 𝑂𝐷𝐿𝐽 through the method of a of vehicle trajectories passing on OD L J path and quickly calculates the unmatched and limited constraint feature window. According to the characteristics of 𝐸𝐷𝑎𝑡𝑎 , an dissimilar trajectory segments between ETraj and OD L J through the method of a lim- abnormal trajectory identification algorithm is designed to identify the missed ited constraint feature window. According to the characteristics of EData, an abnormal transactions/false transactions in 𝐸𝐷𝑎𝑡𝑎 . The final steps are to map and label these trajectory identiﬁcation algorithm is designed to identify the missed transactions/false abnormal 𝐸𝐷𝑎𝑡𝑎 and verify the missed/false transactions to confirm the accuracy of the transactions in EData. The ﬁnal steps are to map and label these abnormal EData and results. verify the missed/false transactions to conﬁrm the accuracy of the results. ETraj set SegrD TW OD set 1.while (TRUE) 5 7 2. if( X == L_R-1 && Y == L_T-1 ) 2 4 8 9 3. if R[X] = T[Y] 3 7 8 8. if ( X == L_R-1 ) 9. if R[X] = T[Y] Abnormal Abnormal t raject ory trajectory 1 Detection algorithm 1 2 21. CONTI NUE; 22. if ([X] = T[Y] ) 1 2 7 8 31. else 6 7 32. Y++; end while 4 5 Abnormal 33. return F trajectory 2 5' Figure 3. The framework diagram of the abnormal ETC transaction data detection algorithm. Figure 3. The framework diagram of the abnormal ETC transaction data detection algorithm. 3.1. Trajectory Similarity Matching Algorithm 3.1. Trajectory Similarity Matching Algorithm First, an improved SegrDTW algorithm is proposed to detect the anomalies in ETC First, an improved SegrDTW algorithm is proposed to detect the anomalies in ETC transaction data. By matching the similarities between OD L J trajectory and ETraj vehicle transaction data. By matching the similarities between 𝑂𝐷𝐿𝐽 trajectory and 𝑗𝑟𝑎𝐸𝑇 trajectory, we can identify the abnormal trajectories in ETraj vehicle trajectory. We deﬁne vehicle trajectory, we can identify the abnormal trajectories in 𝑗𝑟𝑎𝐸𝑇 vehicle trajectory. We the similarities in the trajectories as follows. define the similarities in the trajectories as follows. Deﬁnition 6 (Trajectory similarity). There are overlapping or abnormal trajectories between OD L J trajectory and ETraj vehicle trajectory, and DTW distance can be used to quantify the similarity between vehicle trajectory and frame trajectory: Sim(OD L J, ETraj) = DTW(OD L J, ETraj) = D(n, m) (5) where D is the regular matrix composed of path sequence OD L J and the transaction trajectory ETraj , and n and m are the sizes of the matrix, i.e., the lengths of the two sequences, respectively. Using the dynamic programming algorithm, we can ﬁnd a shortest path SP from the matrix, which is deﬁned as follows: Electronics 2022, 11, 1981 8 of 21 Dist OD L J Etraj + minfD(i 1, j), D(i, j 1), D(i 1, j 1)g, i f i 6= 0 and j 6= 0 i j D(i, j) = 0, i f i = j = 0 (6) ¥, i f i = 0 or j = 0 where Dist is the Euclidean distance between two data points; i,j are denoted as the i-th data in the OD L J and the j-th data in the ETraj trajectory sequence, respectively. The pathSP must satisfy 3 major constraints: Boundaries: SP must start at P = 1, 1 and end at P = n, m . ( ) ( ) 1 max row row col col Monotonicity: P P 0 and P P 0, ensuring chronological matching and k k k k1 avoiding cross-matching. row row col col Continuity: P P 1 and P P 1, ensuring adjacent point matching and avoiding k k k k1 cross-point matching. The DTW algorithm is a classic spatio-temporal sequence alignment template-matching algorithm. It requires building a distance matrix from the two trajectories and ﬁnding the path with the smallest sum of the element values from the upper left corner to the lower right corner. The DTW algorithm requires path planning for each iteration, which may lead to problems such as a high calculation complexity and a lot of space. In the expressway ETC system, the ETC transaction sequence cannot use the traditional one-to-one match method to measure the similarities due to the existence of missing transactions and false transactions in the ETC transaction data. The DTW algorithm can lengthen or shorten the sequence axis to process the matching operations in spatiotemporal sequences of different lengths. However, the traditional DTW algorithm adopts point-by-point matching tech- nology and uses dynamic programming technology to complete the optimal path search. For example, for a vehicle trajectory ETraj with a length of M and an OD path OD L J with a length of N, the initial subsequence size of DTW is N M + 1. The computational effort to do the work of matching known sequences would be O(N M). Therefore, the optimization strategy for DTW mainly focuses on two strategies: early discard (lower bound) and limited search space. Based on the DTW algorithm and the characteristics of ETC transaction data, this work compares ETraj with OD L J to ﬁnd the abnormal state of transactions in the ETraj and then designs a restricted search space strategy. Due to the existence of missed transactions or false transactions, which are occasional and uncertain, the trajectory lengths of ETraj and OD path sequence OD L J are not necessarily equal. When there are abnormal data in the vehicle trajectory sequence ETraj, there must be a situation where the unmatched trajectory is inconsistent with the expected corresponding OD L J. To solve this problem, we propose a concept of dynamic search matching step length R, which can reduce unnecessary matching loss by limiting the range of dynamic matching step length R. In the next step, the algorithm ﬁrst identiﬁes the matching points P in ETraj trajectory points without leakage and error within the range R and takes these points P as the matching feature points of vehicle trajectory ETraj and OD path sequence OD L J. Finally, the search window W is generated according to the matched feature points, and the recognition range of missed and false transactions is limited to W, which reduces the amount of calculation and improve the efﬁciency of the algorithm. Deﬁnition 7 (the dynamic search step size). There are cases of missed transactions and false transactions in the vehicle trajectory sequence ETraj, which leads to inconsistencies between the ETraj and the OD L J. In order to solve the problem of unequal length comparison, this paper puts forward the concept of dynamic search step size, that is, the comparison starting point of two trajectories plus the absolute value of the difference between two trajectories, which is called the dynamic search step size R of transaction gantry comparison. R is deﬁned as follows: R = Len OD L J Len ETraj + 1 (7) j ( ) ( )j Electronics 2022, 11, 1981 9 of 21 where Len is a function of obtaining the length of the trajectory sequence. By adding the absolute length deviation between the OD L J and the ETraj, and the boundary length, the range value of the dynamic search step R is obtained. The dynamic search step R is used to limit the maximum search length of the mismatch trajectory between the ETraj and the OD L J. When the ETraj is longer than the OD L J, there must be some false transaction events in the ETraj. Therefore, when matching the trajectory point OD L J traversing in OD L J with the trajectory point of ETraj, we can start from the i-th trajectory point OD L J and add the length of the dynamic search step R as the search range, so as to quickly determine the matching feature points in the range. Similarly, when the length of the ETraj is shorter than the length of the path OD L J, it can be inferred that there must be some missed transaction events in the vehicle trajectory ETraj. Additionally, when we search for the trajectory point OD L J that traverses OD L J with the trajectory point of ETraj, since there are missed transaction events (missed data) in ETraj, the current i-th trajectory point OD L J needs to be compared with some subsequent data in the vehicle trajectory ETraj, so we limit the matching range of the vehicle trajectory sequence ETraj by the length of the dynamic search step R, which can speed up the search of matching feature points that match OD L J . If the matching is successful within the range of R, this point is marked as the feature point of successful matching, and if the matching is unsuccessful within the range of R, this boundary is marked as a failure matching. After completing the retrieval of OD L J , the next step is to retrieve the next trajectory point OD L J . i i+1 The steps of matching point retrieval and recognition are described below. Since the trajectory ETraj formed by EData has a starting point and an ending point, and the spatio- temporal data of the trajectory satisfy the deﬁnition of the midpoint in Euclidean space, we deﬁne the matching feature point P in the trajectory sequence ETraj and OD L J, where the Euclidean distance between ETraj and OD L J is 0, that is, Pjdist(ETraj, SOD L J) = 0 . However, the sequence lengths of ETraj and OD L J are different. In addition to the feature matching points, there are some abnormal points in the transaction record sequence that can not match OD L J. By comparing these points in European distance space, it is impossible to conﬁrm the cause of the anomaly; that is, it is impossible to know which gantry the vehicle passed through to cause the anomaly. In order to solve this problem effectively, we adopted the two-way traversal search strategy of dynamic step R tolerance range. As shown in the following Figure 4, ETraj and OD L J use the Euclidean distance calculation functions dist(ETraj, OD L J ) and dist(ETraj, OD L J ), respectively, taking OD L J as the 1 2 1 starting point and OD L J as the ending point to calculate the matching feature points. When dist(ETraj, OD L J ) 6= 0 or dist(ETraj, OD L J ) 6= 0, that is, the matching of traversing 1 2 trajectory points is unsuccessful, the algorithm tries to continue to verify the matched Electronics 2022, 11, x FOR PEER REVIEW 10 of 22 feature points within the limited range, dynamic step R, and at the same time mark and identify the intermediate abnormal data. ODLJ1 ODLJ2 ETraj 0 20 40 60 80 100 Figure 4. The segmentation of the matching feature window. Figure 4. The segmentation of the matching feature window. After identifying the feature points (i.e., the trajectory points with no abnormalities), Algorithm 1 for realizing matching feature points 𝑃 and establishing search window these matched feature points are used as the boundaries of the anchor points, and the 𝑊 of abnormal trajectory range is as follows. Algorithm 1. Algorithm for establishing anomalous trajectory range retrieval window Input: 𝑗𝑟𝑎𝐸𝑇 (transaction trajectory), 𝑂𝐷𝐿𝐽 (gantry sequence) Output: 𝑊 (window of abnormal trajectory range.) Parameters: 1: I (Initialize gantry sequence pointer) 2: J (Initialize vehicle trajectory pointer) 3: R (dynamic search step size and the value is ABS(𝑗𝑟𝑎𝐸𝑇 .length 𝑂𝐷𝐿𝐽 .length) + 1 4: FA (feature value storage array) 5: EA (auxiliary mark storage array) 6: //Mark the characteristic points matching the rack from beginning to end. 7: FOR (I = 0, J = 0; I < 𝐽𝑂𝐷𝐿 .length && J < 𝑗𝑟𝑎𝐸𝑇 .length; I++, J++) 8: IF(gantry trajectory point 𝑂𝐷𝐿𝐽 [I] is matched with vehicle trajectory point 𝑗𝑟𝑎𝐸𝑇 [J]) 9: Save the location of 𝑂𝐷𝐿𝐽 [I] in FA and the location of 𝑗𝑟𝑎𝐸𝑇 [J] in EA 10: ELSE 11: WHILE (the index value of the vehicle trajectory point is still in the range of 𝑅 ) 12: If the gantry trajectory point 𝑂𝐷𝐿𝐽 [I] matches the current vehicle trajectory point, record the position of the gantry trajectory point and the vehicle trajectory point, otherwise, try to locate the next trajectory point to continue matching. 13: END WHILE 14: END FOR 15: //Matching feature points from back to front. 16: FOR (I = 𝑂𝐷𝐿𝐽 .length −1 J = 𝑗𝑟𝑎𝐸𝑇 .length −1; I > −1 && J > −1 I−−, J−−) 17: Proceed from step 7 to step 14 from back to front. 18: END FOR 19: //Building a window of abnormal features. 20: FOR (I = 1; I < FA.length; I++) 21: WHILE (J < EA.length) 22: Get the driving trajectory sub-sequence and a gantry frame sub- sequence. 23: W.ADD (AFC (trajectory sub-sequence, gantry frame sub-sequence)) 24: END WHILE 25: END FOR 26: RETURN W From step 7 to step 18 of the algorithm, we first traverse in the positive and negative directions and use the Euclidean distance ratio to determine the matching feature points. If the match fails, we will confirm whether there are matching points within the range of Electronics 2022, 11, 1981 10 of 21 window is constructed for the area where the distance between these anchor points is greater than 1 by algorithm. These windows are used to restrict the operation areas to identify abnormal data such as missed transactions and false transactions. Algorithm 1 for realizing matching feature points P and establishing search window W of abnormal trajectory range is as follows. Algorithm 1. Algorithm for establishing anomalous trajectory range retrieval window Input: ETraj (transaction trajectory), OD L J (gantry sequence) Output: W (window of abnormal trajectory range.) Parameters: 1: I (Initialize gantry sequence pointer) 2: J (Initialize vehicle trajectory pointer) 3: R (dynamic search step size and the value is ABS (ETraj . length OD L J. length) + 1 4: FA (feature value storage array) 5: EA (auxiliary mark storage array) 6: //Mark the characteristic points matching the rack from beginning to end. 7: FOR (I = 0, J = 0; I < OD L J. length && J < ETraj. length; I++, J++) 8: IF(gantry trajectory point OD L J[I] is matched with vehicle trajectory point ETraj[J]) 9: Save the location of OD L J[I] in FA and the location of ETraj[J] in EA 10: ELSE 11: WHILE(the index value of the vehicle trajectory point is still in the range of R) 12: If the gantry trajectory point OD L J[I] matches the current vehicle trajectory point, record the position of the gantry trajectory point and the vehicle trajectory point, otherwise, try to locate the next trajectory point to continue matching. 13: END WHILE 14: END FOR 15: //Matching feature points from back to front. 16: FOR (I = OD L J.length 1 J = ETraj.length 1; I > 1 && J > 1 I, J) 17: Proceed from step 7 to step 14 from back to front. 18: END FOR 19: //Building a window of abnormal features. 20: FOR (I = 1; I < FA.length; I++) 21: WHILE (J < EA.length) 22: Get the driving trajectory sub-sequence and a gantry frame sub-sequence. 23: W.ADD (AFC (trajectory sub-sequence, gantry frame sub-sequence)) 24: END WHILE 25: END FOR 26: RETURN W From step 7 to step 18 of the algorithm, we ﬁrst traverse in the positive and negative directions and use the Euclidean distance ratio to determine the matching feature points. If the match fails, we will conﬁrm whether there are matching points within the range of step size R. As shown in Figure 5, hd , . . . , d i and hq , . . . , q i are the trajectory points 0 n 0 m of ETraj and OD L J, respectively. The Euclidean distance algorithm is used to calculate the matching trajectory point with p = 0 in the range of R (the orange position point in Figure 5). If there is no matching point in the constraint range of R (the green or blue search area in Figure 5), mark d , q and search the trajectory point of d , q to ﬁnd the i j i+1 j+1 matching feature points. The algorithm performs forwards and backwards traversal to ensure that the intermediate data without corresponding relationship can be identiﬁed in the matching process and that all the segmented matching points with Euclidean distance equal to 0 in the trajectories of ETraj and OD L J are obtained. From step 20 to step 25, the algorithm constructs a window W according to the identiﬁed segmentation matching points as boundaries: row,col W = min D i 1, j , D i, j 1 , D i, j (8) f ( ) ( ) ( )g i,j=1 Electronics 2022, 11, x FOR PEER REVIEW 11 of 22 step size 𝑅 . As shown in Figure 5, < 𝑑 , . . . , 𝑑 > and < 𝑞 , . . . , 𝑞 > are the trajectory 0 𝑛 0 𝑚 points of 𝑗𝑟𝑎𝐸𝑇 and 𝑂𝐷𝐿𝐽 , respectively. The Euclidean distance algorithm is used to calculate the matching trajectory point with 𝑝 = 0 in the range of 𝑅 (the orange position point in Figure 5). If there is no matching point in the constraint range of 𝑅 (the green or blue search area in Figure 5), mark (𝑑 , 𝑞 ) and search the trajectory point of (𝑑 , 𝑞 ) 𝑖 𝑗 𝑖 +1 𝑗 +1 to find the matching feature points. The algorithm performs forwards and backwards traversal to ensure that the intermediate data without corresponding relationship can be identified in the matching process and that all the segmented matching points with Euclidean distance equal to 0 in the trajectories of 𝑗𝑟𝑎𝐸𝑇 and 𝑂𝐷𝐿𝐽 are obtained. From step 20 to step 25, the algorithm constructs a window 𝑊 according to the identified segmentation matching points as boundaries: 𝑟𝑜𝑤 ,𝑐𝑜𝑙 (8) 𝑊 = ∑𝑚𝑖𝑛 {𝐷 (𝑖 − 1, 𝑗 ), 𝐷 (𝑖 , 𝑗 − 1), 𝐷 (𝑖 , 𝑗 )} 𝑖 ,𝑗 =1 where 𝐷 is the grid area formed by the driving trajectory sub-sequence of 𝑂𝐷𝐿𝐽 and the sub-sequence of 𝑗𝑟𝑎𝐸𝑇 . By constructing the minimum window 𝑊 , 𝑊 = 𝑟 × 𝑟 , 𝑟 = < Electronics 2022, 11, 1981 𝑟𝑜𝑤 𝑐𝑜𝑙 𝑟𝑜𝑤 11 of 21 𝑟 , 𝑟 , 𝑟 , . . . , 𝑟 > , 𝑟 = < 𝑟 , 𝑟 , 𝑟 , . . . , 𝑟 > . Using the abnormal feature identification 1 2 3 𝑚 𝑐𝑜𝑙 1 2 3 𝑛 algorithm AFC (step 23), the abnormal data in the window are detected by path planning to identify the missed or false transaction data. Step 26 returns the window 𝑊 of the where D is the grid area formed by the driving trajectory sub-sequence of OD L J and the recognized data feature result. sub-sequence of ETraj. W = 3 × 4 d W = 2 × 3 R = 3 …… q q q q q 0 1 2 m Figure 5. Schematic diagram of ﬁnding and conﬁrming matching points. Figure 5. Schematic diagram of finding and confirming matching points. By constructing the minimum window W, W = r r , r = hr , r , r , . . . , r i, row row 2 3 m 3.2. Performance Analysis of SegrDTW Algorithm col 1 r = hr , r , r , . . . , r i. Using the abnormal feature identiﬁcation algorithm AFC (step 23), col 1 2 3 n The feature window is constructed as the region between (𝑑 , 𝑞 ) and (𝑑 , 𝑞 ) and 2 2 6 4 the abnormal data in the window are detected by path planning to identify the missed or the region between (𝑑 , 𝑞 ) and the feature matching points (𝑑 , 𝑞 ), as shown in Figure 7 5 10 7 false transaction data. Step 26 returns the window W of the recognized data feature result. 6. After the window is built, the similarity of dynamic path planning can be calculated by the DTW algorithm. The window built by 𝑂𝐷𝐿𝐽 and 𝑗𝑟𝑎𝐸𝑇 can be divided into two 3.2. Performance Analysis of SegrDTW Algorithm categories. One is the matching window 𝜉 , that is, Euclidean distance 𝑑𝑖𝑠𝑡 (𝑑 , 𝑞 ) = 0, with The feature window is constructed as the region between (d , q ) and (d , q ) and the 2 2 6 4 time complexity 𝑂 (1), such as 𝜉 = 𝑑𝑖𝑠𝑡 (𝑑 , 𝑞 ), 𝜉 = 𝑑𝑖𝑠𝑡 (𝑑 , 𝑞 ), and 𝜉 = 𝑑𝑖𝑠𝑡 (𝑑 , 𝑞 ), 1 1 1 2 2 2 3 6 4 region between (d , q ) and the feature matching points (d , q ), as shown in Figure 6. After 7 5 10 7 and the other is the sequence difference window η , which has the retrieval distance the window is built, the similarity of dynamic path planning can be calculated by the DTW 𝑑𝑖𝑠𝑡 (𝑑 , 𝑞 ) > 0 with two divided matching windows, such as 𝜂 = 𝑑𝑖𝑠𝑡 ([𝑑 , 𝑑 , 𝑑 ], 𝑞 ) 1 3 4 5 3 algorithm. The window built by OD L J and ETraj can be divided into two categories. One or 𝜂 = 𝑑𝑖𝑠𝑡 ([𝑑 , 𝑑 ], 𝑞 ). 2 8 9 6 is the matching window x, that is, Euclidean distance dist(d, q) = 0, with time complexity O(1), such as x = dist(d , q ), x = dist(d , q ), and x = dist(d , q ), and the other is the Electronics 2022, 11, x FOR PEER REVIEW 12 of 22 1 1 1 2 2 2 3 6 4 sequence difference window , which has the retrieval distance dist(d, q) > 0 with two divided matching windows, such as h = dist([d , d , d ], q ) or h = dist([d , d ], q ). 1 3 4 5 3 2 8 9 6 3 d d q 1 4 q 6 20 40 60 80 100 120 Figure 6. A projection comparison between two trajectory sequences in the window. Figure 6. A projection comparison between two trajectory sequences in the window. The similarity of the two trajectories is calculated as follows: The similarity of the two trajectories is calculated as follows: m,n 𝑚 ,𝑛 argminD = D (9) å i,j i=1,j=1 𝑎𝑟𝑔𝑚𝑖𝑛 𝐷 = ∑ 𝐷 (9) 𝑖 ,𝑗 𝑖 =1,𝑗 =1 where D = hx , . . . , x , h , . . . , h i. i,j 1 m 1 n where 𝐷 =< 𝜉 , . . . , 𝜉 , 𝜂 , . . . , 𝜂 >. 𝑖 ,𝑗 1 𝑚 1 𝑛 In order to prove that the retrieval efﬁciency of the proposed SegrDTW algorithm is In order to prove that the retrieval efficiency of the proposed SegrDTW algorithm is better than the DTW algorithm, the following related theorems are given and theoretically better than the DTW algorithm, the following related theorems are given and theoretically proved in this paper. proved in this paper. Theorem 1. Given a standard trajectory 𝑂𝐷𝐿𝐽 =< 𝑞 , 𝑞 , 𝑞 , . . . , 𝑞 > of sequence length m and 1 2 3 𝑚 an ETC trajectory 𝑗𝑟𝑎𝐸𝑇 = < 𝜉 , 𝜉 , … , 𝜉 , 𝜂 , 𝜂 , . . . , 𝜂 > of sequence length n, the 1 2 𝑖 𝑖 +1 𝑖 +2 𝑛 necessary time complexity is 𝑂 (𝑂𝐷𝐿𝐽 , 𝑗𝑟𝑎𝐸𝑇 ) < 𝑂 (𝑂𝐷𝐿𝐽 , 𝑗𝑟𝑎𝐸𝑇 ). 𝑆𝑒𝑔𝑟𝐷𝑇𝑊 𝑊𝐷𝑇 Proof of Theorem 1. There are normal or abnormal trajectory points in the 𝑗𝑟𝑎𝐸𝑇 trajectory sequence. Let 𝜉 be the normal trajectory points in the section and 𝑃 (𝜆 ) be the probability of 𝜉 appearing in the section, that is, 𝜉 ~𝑃 (𝜆 ). 𝜂 represents the abnormal trajectory points in the cross section, and 𝑃 (𝜆 ) is the probability of 𝜂 appears in the section, that is, 𝜂 ~𝑃 (𝜆 ). If 𝑃 (𝜆 ) and 𝑃 (𝜆 ) are independent of each other, then there is 2 1 2 𝜉 + 𝜂 ~𝑃 (𝜆 + 𝜆 ). 𝜉 , 𝜂 is a discrete random variable, and all possible 𝑘 values of 𝜉 + 𝜂 1 2 { } are 1, 2, …; event {𝜉 + 𝜂 = 𝑘 } is the sum of mutually exclusive events 𝜉 = 𝑖 , 𝜂 = 𝑘 − 𝑖 , (0 ≤ 𝑖 ≤ 𝑘 ). Because 𝜉 and 𝜂 are independent of each other: 𝑃 (𝜉 + 𝜂 = 𝑘 ) = ∑ 𝑃 (𝜉 = 𝑖 , 𝜂 = 𝑘 − 𝑖 ) 𝑖 =1 = ∑ 𝑃 (𝜉 = 𝑖 )𝑃 (𝜂 = 𝑘 − 𝑖 ) 𝑖 =1 𝑖 −𝜆 𝑘 −𝑖 −𝜆 1 2 (10) 𝜆 𝑒 𝜆 𝑒 1 2 = ∑ 𝑖 ! (𝑘 − 𝑖 )! 𝑖 =1 −(𝜆 +𝜆 ) 1 2 𝑖 𝑖 𝑘 −𝑖 = ∑ 𝐶 𝜆 𝜆 𝑘 1 2 𝑘 ! 𝑖 =1 (𝜆 + 𝜆 ) 1 2 −(𝜆 +𝜆 ) 1 2 = 𝑒 𝑘 ! It can be deduced that 𝜉 + 𝜂 ~𝑃 (𝜆 + 𝜆 ) and the event conform to the Poisson 1 2 distribution. Assuming 𝜉 ~𝑃 (𝜆 ), the similar paths are calculated by Euclidean distance, …… Electronics 2022, 11, 1981 12 of 21 Theorem 1. Given a standard trajectory OD L J= q , q , q , . . . , q of sequence length m and an h i 1 2 3 m ETC trajectory ETraj = hx , x , . . . , x , h , h , . . . , h i of sequence length n, the necessary 1 2 i i+1 i+2 time complexity is O (OD L J, ETraj) < O (OD L J, ETraj). SegrDTW DTW Proof of Theorem 1. There are normal or abnormal trajectory points in the ETraj trajectory sequence. Let x be the normal trajectory points in the section and P(l ) be the probability of x appearing in the section, that is, x P(l ). h represents the abnormal trajectory points in the cross section, and P(l ) is the probability of h appears in the section, that is, h P(l ). 2 2 If P(l ) and P(l ) are independent of each other, then there is x + h P(l + l ). x 1 2 1 2 , h is a discrete random variable, and all possible k values of x + h are 1, 2, . . . ; event fx + h = kg is the sum of mutually exclusive events fx = i, h = k ig, (0 i k). Be- cause x and h are independent of each other: P(x + h = k) = P(x = i, h = k i) i=1 = å P(x = i)P(h = k i) i=1 k l ki l 1 2 l e l e (10) 1 2 i! (ki)! i=1 (l +l ) 1 2 e i i ki = å C l l k! k 1 2 i=1 (l +l ) 1 2 (l +l ) 1 2 = e k! It can be deduced that x + h P(l + l ) and the event conform to the Poisson 1 2 distribution. Assuming x P(l ), the similar paths are calculated by Euclidean dis- tance, the distance is 0, and the time complexity is O(1). Let h P(l ) calculate sim- ilar paths through the DTW algorithm. When r < m and r < n, there must be row col O(r , r ) < O(m, n). Because x P(l ) and h P(l ) are independent events, the row col 1 2 trajectory generation is not affected, so the similarity calculation of x can be neglected, and it takes a great deal of time to search and calculate only by considering the sequence of h in the trajectory. Therefore, it can be concluded that the SegrDTW algorithm is superior to the DTW algorithm in execution efﬁciency. Algorithm complexity analysis: For the ETraj sequence of length M and the OD L J sequence of length N, the traditional DTW algorithm uses point-by-point matching tech- nology and dynamic programming technology to complete the optimal path search to ﬁnd the matching path, and its time complexity is O(N M). In order to improve the search efﬁciency, in the SegrDTW calculation process, ETraj and OD L J are ﬁrst calculated to match the trajectory points in the forward and reverse directions. These two steps are based on the path length N of OD L J, and the total computational complexity is 2 O(N). Then, according to these matching trajectory points, there are n windows W(r r) with missed transactions or false transactions in the section, and the number of normal section windows W 1 1 is N n . At last, only n windows W r r need to be calculated to ( ) ( ) ( ) match the path. Therefore, the total time complexity of the SegrDTW algorithm can be calculated as follows: T = 2N + n O(W ) + (N n) + n O r rr 2 2 = 2N + (N n) + n O r + n O r (11) = 3N + n 2O r 1 It can be seen from Equation (11) that the total time complexity of the SegrDTW algorithm depends on the path length N of OD L J, the number n of recognized windows and the dynamic step length r of windows. Through the statistical analysis of the trajectory data for Fujian Province in the early stage, the statistical data of the incidence of abnormal trajectory windows and the dynamic step length r of built windows are shown in Table 1. Electronics 2022, 11, 1981 13 of 21 Table 1. The dynamic search matching range length and abnormal data rate of a three-day ETC transaction trajectory in Fujian Province. Lower Limit of Dynamic Upper Limit of Dynamic Trajectory Data Date Number of N Search Matching Range R Search Matching Range R Anomaly Rate Statistics 3 September 2020 41,860 3 5 4.73% 4 September 2020 40,651 3 4 4.65% 5 September 2020 41,769 3 5 5.30% According to the statistics of the existing ETC trajectory data, the value range of r is in [3,5]. Compared with the path length N of OD L J reaching more than 40,000, the value of r is much smaller than N, which means its inﬂuence is almost negligible; that is, the time complexity of O r is about equal to O(1). Because the ratio of abnormal window n in the total trajectory matching is 4–6% of N, and n is much smaller than N, that is, n << N. Therefore, the above conditions can be substituted into Formula 11 3N + n O(1) N, and the time complexity of the SegrDTW algorithm can be deduced as O(N). 3.3. Algorithm for Identifying Abnormal Transaction Data After matching the abnormal trajectories, we can obtain the trajectory sub-sequence data with missed transactions or false transactions in the trajectory sequence ETraj of each vehicle in the trajectory set. Next, we need to identify causes that may lead to these missed and false transactions. EData has spatio-temporal attributes (transaction time and latitude and longitude position of the gantry), and the space distance of the gantry deployed in the adjacent lanes of the expressway is much smaller than that of the same lane. Based on these spatio-temporal features, the abnormal trajectory sequence data can be identiﬁed, i.e., the abnormal transaction data (missed transaction or false transaction) identiﬁcation algorithm will compare and verify the spatio-temporal semantics of the constructed abnormal window W and the trajectory point data in the trajectory sequence OD L J. The identiﬁcation formula for verifying abnormal characteristic value F is as follows: 1, OD L J = ETraj w[i] w[j] F = (12) 0, otherwise Through the positive and negative directions of two-way veriﬁcation, the characteristic F of missed transactions or false transactions is obtained, and the data on these missed transactions or false transactions are tagged and returned to the result set. The abnormal transaction data feature recognition algorithm is as follows Algorithm 2. According to the matching distance set of the vehicle trajectory sequence ETraj and the gantry sequence OD L J, the algorithm identiﬁes the abnormal features of the vehicle trajectory. According to the principle of continuity and monotonicity in the DTW algorithm, from step 5 to step 9, through the comparison of distances in three directions and according to the minimum distance, the running trajectory of the vehicle is calculated, and the spatio- temporal information of the S I D passing through the gantry is compared with the trajectory point of ETraj. Through the comparison, the ﬂow status of the transaction data in this S I D is obtained, and 0/1 is marked, where 1 is no exception and 0 is a missed transaction. For example, 01/10/11 indicates that false transactions exist in this S I D, and other statuses indicate the detection of other errors. From step 12 to step 18 of the algorithm, the window boundary is processed. When calculating the route, if the gantry sequence set reaches the last value while the vehicle trajectory set does not reach the last value, only the rest of the vehicle trajectory set needs to be circulated; otherwise, only the rest of the gantry sequence set needs to be circulated. At last, the algorithm returns the feature sequence. Electronics 2022, 11, 1981 14 of 21 Algorithm 2. Feature recognition algorithm for abnormal transaction data Input: OD L J (gantry sequence set) ETraj (vehicle trajectory set) Output: F (Identiﬁcation label results) 1: Initialize the gantry sequence pointer I 2: Initialize the vehicle trajectory pointer J 3: WHILE(TRUE) 4: //Matching the front window trajectory 5: IF (the current trajectory point match the current value of the entry sequence) 6: the position of the current trajectory point is recognized as the normal trajectory point and stored in the result set F. 7: ELSE 8: Match and verify the spatio-temporal attribute of the trajectory point pointed by J with the value on the right side of the gantry sequence pointed by I and the value near the top of the window, conﬁrm whether the trajectory is a missed transaction or a false transaction through the spatial position, and output the result of identifying the result set F. 9: END IF 10: The gantry pointer I and the vehicle trajectory pointer J point to the next set of trajectory data. 11: //Returns the output results of the window boundaries in three cases 12: IF (the gantry pointer I and the vehicle trajectory pointer J both reach the end point) 13: Comparing the last object of the vehicle trajectory ETraj with the last object of the gantry sequence OD L J, and outputting the result to F. 14: ELSE IF (the sequence pointer I reaches the end point) 15: Compare the last object of ETraj with the remaining object of OD L J, and output the result to the recognition result set. 16: ELSE IF (the vehicle trajectory pointer J has reached the end point) 17: Comparing the last object of the gantry sequence OD L J with the remaining vehicle trajectory ETraj, and outputting the result to F. 18: END IF 19: END WHILE 20: RETURN F 4. Results The conditions of this experiment are a Windows 10 system in a Java language de- velopment environment with an Intel Core 2.6 GHz processor and 16 GB memory on a Lenovo computer. 4.1. Data Sources and Experimental Settings The data set contains about 60,344,500 pieces of real transaction data collected by Fujian Expressway Information Technology Company through a ETC gantry system from 3–12 September 2020, which is converted into 9,057,200 transaction trajectories, and the gantry topology information of Fujian Province. Among these data, the original transaction data table contains 103 ﬁelds, recording the information of each part of the vehicle and gantry, including license plate number, gantry id, transaction data, gantry time, latitude and longitude, etc. (see Table 2). Table 2. ETC gantry system transaction data attributes. Attribute Name Examples Attribute Name Examples Trade ID 34098 OBU Plate Blue Fujian A145 5 September 2020 Trade time Vehicle Class 1 21:29:26 5 September 2020 Flag ID 2514 Enter Time 20:23:51 Flag Type 0 Enter Station 257 Flag Index 1 OBU ID 13BD6 LAT 118.56 LNG 24.85 Electronics 2022, 11, x FOR PEER REVIEW 16 of 22 Electr onics 2022, 11, 1981 15 of 21 4.2. Experiment and Discussion on Verification of SegrDTW Model 4.2. Experiment and Discussion on Veriﬁcation of SegrDTW Model 4.2.1. Introduction of the Experimental Data 4.2.1. Introduction of the Experimental Data Before verifying our algorithm experiment, the data anomalies of 9,057,200 trading Before verifying our algorithm experiment, the data anomalies of 9,057,200 trading trajectories from 3–12 September 2020 are analyzed by DTW and other algorithms. The trajectories from 3–12 September 2020 are analyzed by DTW and other algorithms. The data used in the experiment account for 27% of the total data, as shown in Figure 7a. In data used in the experiment account for 27% of the total data, as shown in Figure 7a. In addition, it is confirmed that there is a correlation between the vehicle journey and the addition, it is conﬁrmed that there is a correlation between the vehicle journey and the number of gates passing through in the ETC transaction data used. As shown in Figure number of gates passing through in the ETC transaction data used. As shown in Figure 7b, 7b, when a vehicle passes through more ETC gantries, the error rate of the vehicle’s when a vehicle passes through more ETC gantries, the error rate of the vehicle’s trajectory trajectory will also increase. will also increase. 1.0 Normal data 0.9 Abnormal data 0.8 0.73 0.72 0.7 0.6 0.5 0.4 0.28 0.3 0.27 0.2 0.1 0.0 Trajectory data ETC transaction data (a) (b) Figure 7. Analysis of the experimental data: (a) overview of abnormal transaction data; (b) the Figure 7. Analysis of the experimental data: (a) overview of abnormal transaction data; (b) the correlations of the numbers of ETC gantries in the vehicle trajectories. correlations of the numbers of ETC gantries in the vehicle trajectories. 4.2.2. Comparison of the Algorithms’ Performance 4.2.2. Comparison of the Algorithms’ Performance In order to verify the efficiency of SegrDTW algorithm in detecting and identifying In order to verify the efﬁciency of SegrDTW algorithm in detecting and identifying missed/false transaction data, this experiment selected four sections in Fujian Province: missed/false transaction data, this experiment selected four sections in Fujian Province: Y Yiban iban-Haicang -Haicang hub, hub, Gaochang-Dapu Gaochang-D hub, apu Xianyoubangtou-Fuzhounan, hub, Xianyoubangtou-Fu and zhoun Nanpingbei- an, and Minhouganzhe Nanpingbei-Minhou as test gasections. nzhe as test The sect data ions. covers The dat Fuzhou, a covers Xiamen, Fuzhou, Quanzhou, Xiamen, Qua Sanming, nzhou, Putian Sanming, Put and Nanping. ian and Nanping. This This experiment experiment veriﬁes verifies the the performance performance of of the the algorithm algorithm pr pro oposed posed in in this this paper paper from three aspects: the retrieval efﬁciency of trajectory data with speciﬁed OD path length, from three aspects: the retrieval efficiency of trajectory data with specified OD path length, the retrieval efﬁciency of long-distance trajectory data and the retrieval efﬁciency of large the retrieval efficiency of long-distance trajectory data and the retrieval efficiency of large amounts of data. amounts of data. The ﬁrst part is the veriﬁcation of the retrieval efﬁciency of speciﬁed OD path trajectory. The first part is the verification of the retrieval efficiency of specified OD path The time efﬁciency of SegrDTW is compared with that of the DTW, Hausdorff and EDR trajectory. The time efficiency of SegrDTW is compared with that of the DTW, Hausdorff algorithms, and the results are shown in Table 3. Compared with the other algorithms, and EDR algorithms, and the results are shown in Table 3. Compared with the other SegrDTW has obvious advantages in efﬁciency. Experiments can prove the effectiveness of algorithms, SegrDTW has obvious advantages in efficiency. Experiments can prove the the SegrDTW algorithm model in the application of spatial-temporal data similarity, which effectiveness of the SegrDTW algorithm model in the application of spatial-temporal data ensures the robustness of retrieval and effectively improves the running efﬁciency. For similarity, which ensures the robustness of retrieval and effectively improves the running example, on the Yiban-Haicang hub, the retrieval time for SegrDTW is 6.9, 7.3 and 10 times efficiency. For example, on the Yiban-Haicang hub, the retrieval time for SegrDTW is 6.9, shorter than the times for DTW, Hausdorff and EDR, respectively. 7.3 and 10 times shorter than the times for DTW, Hausdorff and EDR, respectively. Table 3. The algorithm retrieval times. Table 3. The algorithm retrieval times. Sections SegrDTW DTW Hausdorff EDR Sections SegrDTW DTW Hausdorff EDR Yiban-Haicang hub 0.18 s 1.25 s 1.31 s 1.80 s Yiban-Haicang hub 0.18 s 1.25 s 1.31 s 1.80 s Mawei-Ningdexi 0.24 s 1.15 s 1.38 s 1.78 s Mawei-Ningdexi 0.24 s 1.15 s 1.38 s 1.78 s Xianyou Bangtou-FuZhounan 0.15 s 0.77 s 0.98 s 1.28 s Xianyou Bangtou-FuZhounan 0.15 s 0.77 s 0.98 s 1.28 s Nanpingbei-Minhou ganzhe 0.21 s 1.14 s 1.21 s 2.25 s Nanpingbei-Minhou ganzhe 0.21 s 1.14 s 1.21 s 2.25 s Rate Electronics 2022, 11, x FOR PEER REVIEW 17 of 22 Electronics 2022, 11, 1981 16 of 21 From a set of one-day transaction data for Fujian Province, a total of 16,528 pairs of ODs can be obtained. According to the average retrieval time of OD pairs in the above From a set of one-day transaction data for Fujian Province, a total of 16,528 pairs of four sections, the abnormal data detection time of all OD pairs in the whole province can ODs can be obtained. According to the average retrieval time of OD pairs in the above four be estimated. The EDR algorithm takes 29,378.52 s (about 8.16 h), the Hausdorff algorithm sections, the abnormal data detection time of all OD pairs in the whole province can be takes 20,164.16 s (about 5.6 h), the DTW algorithm takes 17,808.92 s (about 4.95 h), and estimated. The EDR algorithm takes 29,378.52 s (about 8.16 h), the Hausdorff algorithm SegrDTW takes 3222.96 s (about 0.86 h). The SegrDTW algorithm proposed in this paper takes 20,164.16 s (about 5.6 h), the DTW algorithm takes 17,808.92 s (about 4.95 h), and saves much time in detecting the daily transaction data for the whole province. If the ETC SegrDTW takes 3222.96 s (about 0.86 h). The SegrDTW algorithm proposed in this paper gantry transaction data anomaly detection is applied in the whole country, the retrieval saves much time in detecting the daily transaction data for the whole province. If the ETC efficiency of the method proposed in this paper will have greater advantages. gantry transaction data anomaly detection is applied in the whole country, the retrieval We also verify the performance of the algorithm according to the long-path efﬁciency of the method proposed in this paper will have greater advantages. transaction data (long data retrieval performance). In the experiment, the OD sequence of We also verify the performance of the algorithm according to the long-path transaction gantry in the Fujian section of Shen-Hai Expressway was selected, with the lengths of 15, data (long data retrieval performance). In the experiment, the OD sequence of gantry in 25, 35, 45 and 55 gantries, to match and detect the vehicle trajectories. The results are the Fujian section of Shen-Hai Expressway was selected, with the lengths of 15, 25, 35, shown in Figure 8. The horizontal axis represents the length of the sequence, while the 45 and 55 gantries, to match and detect the vehicle trajectories. The results are shown in vertical axis represents the algorithm running time. As can be seen from the figure, Figure 8. The horizontal axis represents the length of the sequence, while the vertical axis compared with other algorithms, the running time of the SegrDTW algorithm has a linear represents the algorithm running time. As can be seen from the ﬁgure, compared with relationship with the length of the sequence, and the range of change is very small, which other algorithms, the running time of the SegrDTW algorithm has a linear relationship with is close to a linear relationship and has little impact. This is mainly because our algorithm the length of the sequence, and the range of change is very small, which is close to a linear adopts the dynamic search step size search, which can reduce the amount of calculation relationship and has little impact. This is mainly because our algorithm adopts the dynamic to the greatest extent. search step size search, which can reduce the amount of calculation to the greatest extent. SegrDTW DTW Hausdorff EDR 0.5 15 25 35 45 55 Number of gantry Figure Figure 8. 8. Number Number of of gantries gantries tested tested for for rru un n t time. ime. In view of the abnormal feature detection requirements of the ETC transaction data, In view of the abnormal feature detection requirements of the ETC transaction data, we also set up a large number of data to verify the processing pressure of the algorithm. we also set up a large number of data to verify the processing pressure of the algorithm. Speciﬁcally, the data for Quanzhou to Xiamen from September 3 to September 12 were se- Specifically, the data for Quanzhou to Xiamen from September 3 to September 12 were lected for the experiment. The results are shown in Figure 9. The horizontal axis represents selected for the experiment. The results are shown in Figure 9. The horizontal axis the data volume of the 10-day trajectory, and the vertical axis represents the running time. represents the data volume of the 10-day trajectory, and the vertical axis represents the It can be seen from the ﬁgure that the running times for the EDR, DTW and Hausdorff running time. It can be seen from the figure that the running times for the EDR, DTW and algorithms increase exponentially with increasing amounts of trajectory data, which indi- Hausdorff algorithms increase exponentially with increasing amounts of trajectory data, cates that the amount of data has a great inﬂuence on the running time. There is a linear which indicates that the amount of data has a great influence on the running time. There relationship between the running time of the SegrDTW algorithm and the amount of data. is a linear relationship between the running time of the SegrDTW algorithm and the As the increase of data volume, the range of change has increased slightly and tends to be amount of data. As the increase of data volume, the range of change has increased slightly stable. This algorithm has strong performance and stable operation. This is because the and tends to be stable. This algorithm has strong performance and stable operation. This SegrDTW algorithm greatly increases the proportion of discarding and reducing unnec- is because the SegrDTW algorithm greatly increases the proportion of discarding and essary calculations, thus showing the greater compactness of the lower bound. When the reducing unnecessary calculations, thus showing the greater compactness of the lower spatio-temporal complexity is low, the dissimilar sub-sequences are discarded as early as bound. When the spatio-temporal complexity is low, the dissimilar sub-sequences are Running time/s Electronics 2022, 11, x FOR PEER REVIEW 18 of 22 Electronics 2022, 11, x FOR PEER REVIEW 18 of 22 Electronics 2022, 11, 1981 17 of 21 discarded as early as possible, which reduces the subsequent DTW calculation, so that the discarded as early as possible, which reduces the subsequent DTW calculation, so that the possible, retrieval which time will not ob reduces the viously subsequent increase with DTW calculation, increases i so n th that e num the r ber. etrieval time will not retrieval time will not obviously increase with increases in the number. obviously increase with increases in the number. SegrDTW SegrDTW DTW DTW Hausdorff Hausdorff EDR 30 EDR 3k 6k 9k 12k 15k 18k 21k 24k 27k 30k 3k 6k 9k 12k 15k 18k 21k 24k 27k 30k Number of data Number of data Figure 9. Running efficiency tests on different data volumes. Figure Figure 9. 9. Running Running ef effi ﬁciency ciency t tests ests on on different d different data ata volumes. volumes. 4.2.3. Identify the Abnormal Transaction Trajectory 4.2.3. Identify the Abnormal Transaction Trajectory 4.2.3. Identify the Abnormal Transaction Trajectory In this experiment, according to the transaction data of 𝑗𝑟𝑎𝐸𝑇 , we also tested the In this experiment, according to the transaction data of ETraj, we also tested the In this experiment, according to the transaction data of 𝑗𝑟𝑎𝐸𝑇 , we also tested the accuracy of the identification of missed transactions and false transactions. In order to accuracy of the identiﬁcation of missed transactions and false transactions. In order to accuracy of the identification of missed transactions and false transactions. In order to verify the recognition accuracy and the effectiveness of the algorithm for abnormal data, verify the recognition accuracy and the effectiveness of the algorithm for abnormal data, verify the recognition accuracy and the effectiveness of the algorithm for abnormal data, four representative road sections were selected in the experiment: sugarcane section from four representative road sections were selected in the experiment: sugarcane section from four representative road sections were selected in the experiment: sugarcane section from Nanpingbei-Minhou Ganzhe, Yiban to Haicang hub section, Mawei-Ningdexi section and Nanpingbei-Minhou Ganzhe, Yiban to Haicang hub section, Mawei-Ningdexi section Nanpingbei-Minhou Ganzhe, Yiban to Haicang hub section, Mawei-Ningdexi section and Xianyou Bangtou-Fuzhounan section. The abnormal detection projections of missed and Xianyou Bangtou-Fuzhounan section. The abnormal detection projections of missed Xianyou Bangtou-Fuzhounan section. The abnormal detection projections of missed transactions transactionsand and false false transactions transactions caused caused by by vehicles vehicles passing passing thr throu ough gh these these rro oad ad sections sections transactions and false transactions caused by vehicles passing through these road sections ar are e shown shown in in Figur Figure 1 e 100. . are shown in Figure 10. Normal transaction gantry Normal transaction gantry Normal transaction gantry Normal transaction gantry False transaction gantry False transaction gantry False transaction gantry False transaction gantry Normal trajectory Normal trajectory Normal trajectory Normal trajectory False transaction trajectory False transaction trajectory False transaction trajectory False transaction trajectory Missed transaction trajectory Missed transaction trajectory Missed transaction trajectory Missed transaction trajectory (a) (b) (a) (b) Figure 10. Cont. Running time/s Running time/s Electronics 2022, 11, 1981 18 of 21 Electronics 2022, 11, x FOR PEER REVIEW 19 of 22 Electronics 2022, 11, x FOR PEER REVIEW 19 of 22 Normal transaction gantry Normal transaction gantry False transaction gantry False transaction gantry Normal transaction gantry Normal transaction gantry Normal trajectory Normal trajectory False transaction gantry False transaction gantry False transaction trajectory False transaction trajectory Normal trajectory Normal trajectory Missed transaction trajectory Missed transaction trajectory False transaction trajectory False transaction trajectory Missed transaction trajectory Missed transaction trajectory N N (c) (d) (c) (d) Figure 10. The abnormal transaction trajectory detection findings: (a) Nanpingbei-Minhouganzhe; Figure 10. The abnormal transaction trajectory detection ﬁndings: (a) Nanpingbei-Minhouganzhe; Figure 10. The abnormal transaction trajectory detection findings: (a) Nanpingbei-Minhouganzhe; (b) Yiban-Haicang. (c) Mawei-Ningdexi; (d) Xianyoubangtou-Fuzhounan. (b) Yiban-Haicang. (c) Mawei-Ningdexi; (d) Xianyoubangtou-Fuzhounan. (b) Yiban-Haicang. (c) Mawei-Ningdexi; (d) Xianyoubangtou-Fuzhounan. The figures show the driving trajectories of all vehicles passing through the selected The ﬁgures show the driving trajectories of all vehicles passing through the selected The figures show the driving trajectories of all vehicles passing through the selected four OD paths over three days. The red dot is the ETC gantry on the normal road, while four OD paths over three days. The red dot is the ETC gantry on the normal road, while four OD paths over three days. The red dot is the ETC gantry on the normal road, while the blue dot is the gantry on the adjacent road. The red trajectory line is the trajectory of the blue dot is the gantry on the adjacent road. The red trajectory line is the trajectory the blue dot is the gantry on the adjacent road. The red trajectory line is the trajectory of the vehicle passing through the correct gantry (overlapping with the 𝑗𝑂𝐷𝐿 ), while the of the vehicle passing through the correct gantry (overlapping with the OD Lj), while the the vehicle passing through the correct gantry (overlapping with the 𝑗𝑂𝐷𝐿 ), while the blue and green trajectories are the trajectories of the vehicles with false transactions or blue and green trajectories are the trajectories of the vehicles with false transactions or blue and green trajectories are the trajectories of the vehicles with false transactions or missed transactions. The abnormal transaction data account for the total number of trip missed missed transactions. transactions. The The abnormal abnormal tra transaction nsaction dat data a account account for for the the tototal tal numb number er of of trip trip transactions as shown in Figure 11. transactions transactions as as show shownn in in Figur Figure e 11 11. . Nanpingbei-Minhouganzhe Nanpingbei-Minhouganzhe Yiban-Haicang hub Yiban-Haicang hub False False False False transaction, transaction, transaction, transaction, 2.37% 2.17% Normal 2.17% Normal 2.37% Normal Normal Other, 2.8% transaction, Other, 3.58% transaction, Other, 2.8% transaction, transaction, Other, 3.58% 96.42% 97.20% 96.42% 97.20% Missed Missed Missed transaction, Missed transaction, transaction, 0.43% transaction, 0.43% 1.41% 1.41% Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction (a) (b) (a) (b) Mawei-Ningdexi Xianyou Bangtou-Fuzhounan Xianyou Bangtou-Fuzhounan Mawei-Ningdexi False transaction, False False 1.10% transaction, transaction, False 1.10% Normal 3.35% Normal transaction, Other, 12.47% transaction, transaction, Other, 1.48% Normal 3.35% Normal 87.53% 98.52 Other, 12.47% transaction, transaction, Other, 1.48% Missed Missed 87.53% 98.52 transaction, transaction, Missed Missed 0.38% 9.12% transaction, transaction, 9.12% 0.38% Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction Missed transaction False transaction Normal transaction (c) (d) (c) (d) Figure 11. Abnormal transaction ratios of the driving trajectories. (a) The abnormal proportions of Figure 11. Abnormal transaction ratios of the driving trajectories. (a) The abnormal proportions of missed transactions and false transactions in Nanpingbei-Minhou Ganzhe are 0.43% and 2.37%; Figure 11. Abnormal transaction ratios of the driving trajectories. (a) The abnormal proportions of missed transactions and false transactions in Nanpingbei-Minhou Ganzhe are 0.43% and 2.37%; (b) The (b) The abnormal proportions of missed transactions and false transactions in Yi-ban-Haicang hub missed transactions and false transactions in Nanpingbei-Minhou Ganzhe are 0.43% and 2.37%; (b) The abnormal proportions of missed transactions and false transactions in Yi-ban-Haicang hub Electronics 2022, 11, 1981 19 of 21 abnormal proportions of missed transactions and false transactions in Yi-ban-Haicang hub are 1.41% and 2.17%; (c) The abnormal proportions of missed transactions and false transactions in Mawei- Ningdexi are 9.12% and 3.35%; (d) The abnormal proportions of missed transactions and false transactions in Xianyou Bangtou-Fuzhounan are 0.38% and 1.10%. EData is detected and analyzed using abnormal transaction data detected using the SegrDTW algorithm, and the abnormal degree e(OD L J) of vehicle ETraj transaction data is between 4.65% and 5.30%. The identiﬁed abnormal data can be further cleaned or repaired to provide reliable data for the follow-up intelligent expressway driving decision information service. 4.2.4. Discussion of the ETC Abnormal Transaction Data Detection Algorithm In order to further verify the effectiveness of the SegrDTW algorithm, 3168 normal trajectory data and 3168 abnormal trajectory data are selected in the experiment. Compared with the accuracy of the DTW algorithm, the experimental results are shown in Table 4. It can be seen from Table 4 that the normal detection accuracy is 100%, and the abnormal trajectory detection accuracy is 99.46%. The detection accuracy of the DTW algorithm is the same as that of SegrDTW, which indicates that SegrDTW can achieve fast detection and ensure high retrieval accuracy. Table 4. Accuracy statistics. DTW SegrDTW Class Number TP FP TN FN Accuracy TP FP TN FN Accuracy Normal 3168 0 0 3168 0 100% 0 0 3168 0 100% Abnormal 3168 3156 12 0 0 99.46% 3156 12 0 0 99.46% We use the original feature data and the abnormal feature data set generated to evaluate the accuracy of SegrDTW. The statistical results for the false or missed transaction data and the correct transaction data are shown in Table 5, with a total of 6336 items. Table 5 shows that the detection accuracy rates of false transactions and missed transactions for the DTW and SegrDTW algorithms are both above 99%, which means both algorithms can effectively detect abnormal transaction data. At the same time, the recall rates of false transactions and missed transactions are both lower than 100%, indicating that there will be a small number of samples detected as missed transactions in the false transaction detection test, and a small number of samples in the missed transactions will also be detected as false transactions. The DTW algorithm uses a global search method with precision of 100%, while SegrDTW uses a dynamic radius search method, and the search at the boundary will produce wrong detection, resulting in a lower accuracy rate than DTW. The F1-score results show that the difference between the two algorithms is less than 0.02, which proves that the robustness of the SegrDTW model is slightly lower than that of the DTW algorithm. Table 5. The detection results for abnormal data types. DTW SegrDTW Abnormal Class Accuracy Precision Recall F1-Score Accuracy Precision Recall F1-Score False transaction 0.9996 1 0.9916 0.9958 0.9906 0.9995 0.9833 0.9913 Missed transaction 0.9998 1 0.9932 0.9966 0.9914 0.9996 0.9848 0.9921 5. Conclusions In this paper, we propose an SegrDTW algorithm. The dynamic step size is used to limit the window of similarity retrieval, which effectively reduces the time complexity of the DTW algorithm and can be used for the rapid detection of abnormal ETC data. We evaluated our model on a large-scale trafﬁc dataset. The experiment results show that our Electronics 2022, 11, 1981 20 of 21 proposed method signiﬁcantly outperforms several competing methods. Although this algorithm improves the efﬁciency of abnormal data retrieval, it also has some shortcomings. For instance, the algorithm adopts dynamic radius segmentation retrieval, and there is still a wrong detection at the boundary of segmented trajectory sequence. In the future, we will focus on improving the accuracy of the boundary recognition of the abnormal trajectory range retrieval window and the applicability of the algorithm. Author Contributions: Conceptualization, F.Z. and F.G.; methodology, F.G. and S.L.; software, F.G.; validation, L.L., S.L. and X.Y.; formal analysis, S.L. and C.Z.; investigation, F.G. and J.W.; resources, F.Z.; data curation, F.G.; writing—original draft preparation, F.G. and S.L.; writing—review and editing, F.Z., F.G. and S.L.; visualization, L.L. supervision, J.W.; project administration, F.G. and X.Y.; funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript. Funding: This work is funded by the National Natural Science Foundation of China (41971340), the Special Funds for the Central Government to Guide Local Scientiﬁc and Technological Development (2020L3014), the 2020 Fujian Province “Belt and Road” Technology Innovation Platform (2020D002), and the Provincial Candidates for the Hundred, Thousand and Ten Thousand Talent of Fujian (GY-Z19113). Crosswise project (No. GY-H-21021). Data Availability Statement: Restrictions apply to the availability of these data. Data were obtained from Fujian Expressway Information Technology Co., Ltd. and are available from the authors with the permission of Fujian Expressway Information Technology Co., Ltd. Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Qian, M. Analysis of multi-dimensional data fusion and application of ETC portal system. China ITS J. 2021, 6, 109–112. 2. Zhao, Y.Y.; Chen, Y.J.; Guan, W. Prediction Model of ETC Short Term Trafﬁc Flow Based on Multidimensional Time Series. J. Transp. Syst. Eng. Inf. Technol. 2016, 16, 191–198. 3. Chen, Z.; Wu, B.; Li, B.; Ruan, H. Expressway Exit Trafﬁc Flow Prediction for ETC and MTC Charging System Based on Entry Trafﬁc Flows and LSTM Model. IEEE Access 2021, 9, 54613–54624. [CrossRef] 4. Chiou, J.M.; Liou, H.T.; Chen, W.H. Modeling Time-Varying Variability and Reliability of Freeway Travel Time Using Functional Principal Component Analysis. IEEE Trans. Intell. Transp. Syst. 2021, 22, 257–266. [CrossRef] 5. Chen, L.W.; Chen, D.E. Exploring spatiotemporal mobilities of highway trafﬁc ﬂows for precise travel time estimation and prediction based on electronic toll collection data. Veh. Commun. 2021, 30, 100356. [CrossRef] 6. Tsung, C.K.; Yang, C.T.; Yang, S.W. Visualizing potential transportation demand from ETC log analysis using ELK stack. IEEE Internet Things J. 2020, 7, 6623–6633. [CrossRef] 7. Müller, M. Dynamic time warping. Information Retrieval for Music and Motion; Springer: Berlin/Heidelberg, Germany, 2007; pp. 69–84. 8. Lv, M.; Chen, L.; Chen, G. Mining user similarity based on routine activities. Inf. Sci. 2013, 236, 17–32. [CrossRef] 9. Xiao, X.; Zheng, Y.; Luo, Q.; Xie, X. Inferring social ties between users with human location history. J. Ambient. Intell. Humaniz. Comput. 2014, 5, 3–19. [CrossRef] 10. Liu, Y.; Kang, C.; Gao, S.; Xiao, Y.; Tian, Y. Understanding intra-urban trip patterns from taxi trajectory data. J. Geogr. Syst. 2012, 14, 463–483. [CrossRef] 11. Fang, Z.; Shaw, S.L.; Tu, W.; Li, Q.; Li, Y. Spatiotemporal analysis of critical transportation links based on time geographic concepts: A case study of critical bridges in Wuhan, China. J. Transp. Geogr. 2012, 23, 44–59. [CrossRef] 12. Zheng, Y.; Xie, X. Learning travel recommendations from user-generated GPS traces. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–29. [CrossRef] 13. Li, H.L.; Liang, Y.; Wang, S.C. Review on dynamic time warping in time series data mining. Control Decis. 2018, 33, 1345–1353. 14. Mao, J.Y.; Wu, H.; Sun, W.W. Vehicle trajectory anomaly detection in road network via Markov decision process. Chin. J. Comput. 2018, 41, 1928–1942. 15. George, M.A.; Silversides, K.L.; Zigman, J.; Melkumyan, A. Bedding Angle Identiﬁcation from BIF Marker Shales via Modiﬁed Dynamic Time Warping. Math. Geosci. 2021, 53, 1567–1585. [CrossRef] 16. Ghersi, I.; Ferrando, M.H.; Fliger, C.G.; Arenas, C.F.C.; Molina, D.J.E.; Miralles, M.T. Gait-cycle segmentation method based on lower-trunk acceleration signals and dynamic time warping. Med. Eng. Phys. 2020, 82, 70–77. [CrossRef] 17. Kumar, D.; Wu, H.; Rajasegarar, S.; Leckie, C.; Krishnaswamy, S.; Palaniswami, M. Fast and scalable big data trajectory clustering for understanding urban mobility. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3709–3722. [CrossRef] 18. Hollerbach, A.L.; Conant, C.R.; Nagy, G.; Monroe, M.E.; Gupta, K.; Donor, M.; Giberson, C.M.; Garimella, S.V.B.; Smith, R.D.; Ibrahim, Y.M. Dynamic Time-Warping Correction for Shifts in Ultrahigh Resolving Power Ion Mobility Spectrometry and Structures for Lossless Ion Manipulations. J. Am. Soc. Mass Spectrom. 2021, 32, 996–1007. [CrossRef] [PubMed] Electronics 2022, 11, 1981 21 of 21 19. Ruble, M.; Hayes, C.E.; Welborn, M.; Zajic, A.; Prvulovic, M.; Pitruzzello, A.M.; Hayes, E. Hyperdimensional bayesian time mapping (hyperbat): A probabilistic approach to time series mapping of non-identical sequences. IEEE Trans. Signal Process. 2019, 67, 3719–3731. [CrossRef] 20. Cabanas-Molero, P.; Cortina-Parajón, R.; Combarro, E.F.; Alonso, P.; Bris-Peñalver, F.J. HReMAS: Hybrid real-time musical alignment system. J. Supercomput. 2019, 75, 1001–1013. [CrossRef] 21. Keogh, E.; Wei, L.; Xi, X.; Vlachos, M.; Lee, S.H.; Protopapas, P. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J. 2009, 18, 611–630. [CrossRef] 22. Lin, J.; Keogh, E.; Lonardi, S.; Lankford, J.P.; Nystrom, D.M. Visually mining and monitoring massive time series. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 460–469. 23. Bagnall, A.; Lines, J.; Bostrom, A.; Large, J.; Keogh, E. The great time series classiﬁcation bake off: A review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 2017, 31, 606–660. [CrossRef] 24. Zakaria, J.; Mueen, A.; Keogh, E. Clustering Time Series Using Unsupervised-Shapelets. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), Brussels, Belgium, 10–13 December 2012. 25. Hao, Y.; Campana, B.; Keogh, E. Monitoring and Mining Animal Sounds in Visual Space. J. Insect Behav. 2013, 26, 466–493. [CrossRef] 26. Liu, L.; Zhang, T.; Ru, B. A ﬂying qualities assessment model based on multiparameter integration. Comput. Eng. Sci. 2016, 38, 1262–1268. 27. Teng, H.L.; Li, B.W.; Gao, Y.; Yang, D.; Zhang, Y. Quality evaluation model of unmanned aerial vehicle’s horizontal ﬂight maneuver based on ﬂight data. J. Beijing Univ. Aeronaut. Astronaut. 2019, 45, 2108–2114. 28. Qian, X.; Cai, Z.C. A Mathematical Model for Evaluating Pilot Controlling Quality of Military Aircraft Pilot Controlling. Ordnance Ind. Autom. 2014, 33, 16–18. 29. Wang, D.; Rong, G. Pattern distance of time series. J. Zhejiang Univ. Eng. Sci. 2004, 38, 795–798. 30. Qiu, J.; Ou, J.D.; Xie, D.; Wang, Z.; Du, J. A Similarity Measurement Method for Magnetic Anomaly Signal Under Low Signal-to- noise Based on Orthogonal Basis Function–Edit Distance. J. Electron. Inf. Technol. 2022, 44, 745–753. 31. Cao, Y.; Cao, J.; Zhou, Z.; Liu, Z. Aircraft Track Anomaly Detection Based on MOD-Bi-LSTM. Electronics 2021, 10, 1007. [CrossRef] 32. Liu, C.; Wang, J.; Liu, A.; Cai, Y.; Ai, B. An Asynchronous Trajectory Matching Method Based on Piecewise Space-Time Constraints. IEEE Access 2020, 8, 224712–224728. [CrossRef]
Electronics – Multidisciplinary Digital Publishing Institute
Published: Jun 24, 2022
Keywords: data governance; electronic toll collection data; trajectory similarity; dynamic time warping; anomaly detection; expressway
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.