Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Method of Dynamic Identification of the Maximum Speed Limit of Expressway Based on Electronic Toll Collection Data

The Method of Dynamic Identification of the Maximum Speed Limit of Expressway Based on Electronic... Hindawi Scientific Programming Volume 2021, Article ID 4702669, 15 pages https://doi.org/10.1155/2021/4702669 Research Article The Method of Dynamic Identification of the Maximum Speed Limit of Expressway Based on Electronic Toll Collection Data 1 1 2 2 2 3 Fumin Zou, Feng Guo, Junshan Tian, Sijie Luo , Xiang Yu, Qing Gu, and Lyuchao Liao College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, Fujian, China Fujian Key Lab for Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou 350118, Fujian, China Fujian Provincial Expressway Information Technology Co., Ltd., Fuzhou 350011, Fujian, China Fujian Provincial Big Data Research Institute of Intelligent Transportation, Fujian University of Technology, Fuzhou 350118, Fujian, China Correspondence should be addressed to Sijie Luo; sjluo@fjut.edu.cn Received 5 August 2021; Revised 16 October 2021; Accepted 2 November 2021; Published 18 November 2021 Academic Editor: Zhu Xiao Copyright © 2021 Fumin Zou et al. +is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To overcome the drawbacks of the maximum speed limit information of expressways (i.e., long update cycle and great complexity of information recognition), in this work, an Electronic Toll Collection (ETC) gantry data-based method for dynamically identifying the maximum speed limit information of expressways is proposed. Firstly, the characteristics of the ETC gantry data are analyzed, and then data are cleaned and reconstructed, after which an algorithm is proposed for constructing a vehicle travel speed data set. Secondly, the speed feature vector model of the road section is established by taking the relationship among the speed distribution feature, time domain feature, and the maximum speed limit of the road section into consideration. +en, a data supplement algorithm is constructed to solve the problem of the imbalance of data samples. Finally, the combined GC-XGBoost classification algorithm is used to train and learn the potential speed limit features, and it is verified through the Fujian Provincial Expressway ETC data and the speed limit information provided by the Fujian Traffic Police. +e result shows that the accuracy of the method in the recognition of the maximum limited speed information of the expressway is 97.5%. Compared with the traditional limited speed information recognition and extraction methods, the proposed approach can identify the maximum limited speed information of each section of the expressway more efficiently. It can also accurately identify the dynamic change of the maximum limited speed information, which is able to provide data support for intelligent expressway management systems and map providers. support for the informatization construction, vehicle in- 1. Introduction frastructure cooperation, and automatic driving [2] of smart In recent years, China’s expressway ETC system technology expressway. Obtaining the maximum speed limit informa- has been developed rapidly. More and more vehicles have tion of each section of the expressway is an important part of installed ETC equipment. +ese vehicles interact with ETC intelligent management of expressways [3]; it can provide drivers with expressway speed limit information [4,5] to gantries during driving, resulting in massive ETC data. At present, the cumulative users of ETC have exceeded 220 avoid traffic accidents caused by speeding and provide re- million, and the utilization rate of vehicle owners is 78% [1]. liable perception and driving speed decision-making for Moreover, the ETC gantry can also interact with the Manual autonomous vehicles. However, the maximum speed limit Toll Collection (MTC) system users. +erefore, the ETC information is dynamic and changeable. +e relevant gantry system almost collects the traffic information of all management departments will adjust the speed limit in- vehicles on the expressway, reflecting the overall traffic formation of the road section according to road traffic flow, situation of the expressway, which can provide strong road maintenance conditions, and the number of traffic 2 Scientific Programming model of the speed limit feature is constructed to mine the accidents [6–8]. At present, the method of collecting speed limit identification information is mainly manually col- speed limit feature of the vehicle speed in different aspects. Finally, taking the road maximum speed limit information lected, then the data is uploaded to the system for updating within a certain period. However, this method has two of 534 sections of expressways in Fujian Province as the disadvantages: first, it requires professionals to travel to the sample set. +en, the multivoting ensemble algorithm is expressway and collect speed limit information, which costs used to perform supervised classification training and cross- immense manpower and material resources. Second, it has a validation on the road speed feature. +e test results show long update cycle, and the driver cannot obtain the latest that this method can well identify the maximum speed limit speed limit information, which leads to safety hazards while information and recognize the dynamic changes of the maximum speed limit information on the road. driving, and the traffic efficiency of the road is corre- spondingly reduced. +erefore, the study of how to auto- +e contributions of this paper can be summarized as follows. First, an algorithm is proposed for constructing matically collect the speed limit information and dynamically identify the maximum speed limit information speed sets of road section, which can solve the problem that the speed of road section cannot be calculated due to the lack on the road in real-time has research significance. Traffic flow prediction and travel time prediction are of transaction records of ETC gantries and obtain the speeds research hotspots in the field of transportation. Most of their of vehicles on each road section accurately and completely. research methods and speed limit recognition are supervised Second, this proposal extracts the feature of the road section learning based on machine learning algorithms. +e differ- speed from different aspects to construct the road section ence is that speed limit recognition is a classification problem, speed feature vector model and mine the potential corre- and traffic flow prediction and travel time prediction are lation features between the speed of the vehicles on the expressway and the road speed limit information. +ird, a regression problems. +e recognition of road maximum speed limit information mainly relies on image recognition dynamic recognition method of the maximum speed limit of expressways is proposed to identify the maximum speed technology [9–12] and floating car trajectory data mining technology. +e image recognition technology obtains the limit of the expressway, the validity of the method is verified by the real maximum speed limit information, and the speed limit information of each road by recognizing the speed limit information of the traffic signs on the road. Machine scientificity is verified by comparing a large number of learning is widely used in a variety of research fields [13]. prediction algorithms. Support Vector Machine (SVM) [14], Extreme Learning +is paper is organized as follows. Section 1 introduces Machine (ELM) [15], and multitask convolutional neural the research methods of road speed limit recognition. network (MTCNN) [16] are used to train and learn speed Section 2 defines the related concepts in this work. Section 3 limit signs features to realize the recognition of maximum describes each part of the dynamic method of expressway maximum speed limit. Section 4 shows the experimental road speed limit. Although these methods are relatively suitable in terms of recognition effect, they require surveyors results and analysis. Section 5 draws the conclusion and future work. to collect pictures of speed limit signs on the road, which consumes a lot of resources. In addition, the collection period is long and cannot achieve real-time and dynamic recognition 2. Relevant Definitions maximum speed limit information. In terms of floating car trajectory data mining, the floating car is equipped with a Definition 1. Each ETC gantry of the expressway is col- global positioning system, which records the time, location, lectively called Node, and two adjacent Nodes on the road and other information of the vehicle, and the floating car constituting an expressway section, which is referred to as trajectory data mining can obtain the driving speed feature of QD � {Q, Distance}, Q � < Node1, Node2 >, Node and Q, all floating car on the road [17]. Machine learning algorithm are shown in Figure 1, where Node1 is the start point of the [18] is able to learn the maximum speed limit feature in the road section, Node2 is the end point of the road section, and vehicle speed information of the road to realize the recog- Distance is the actual distance of the road section. nition of the maximum speed limit information. However, the floating car accounts for a small proportion of all cars that Definition 2. Expressway network, formed by all the ex- cannot fully reflect the speed of the vehicles on the ex- pressway sections within this proposal, referred to as pressway. +erefore, the maximum speed limit recognition G � {QD1, . . . , QDn,}. based on floating car data still has certain defects. In view of the high cost of speed limit sign recognition and the shortcomings of trajectory data recognition, this Definition 3. A set of ETC gantries by which a vehicle passed study proposes a method using real-time traffic data col- while driving on the expressway, forming a sequence of lected by an ETC gantry system to identify the maximum nodes in chronological order called trajectory speed limit of expressways dynamically, which solves the Traj � 􏼈D , D , Di, . . . , Dj, D 􏼉, D � (N , T ), 0 ≤ i ≤ E, 0 1 E i i i problems of the high cost of manual information collection ∀i ≤ j, T ≤ T . D is the trajectory point, including node N i j i i and incomplete vehicle data. First, the road section speed set and time property T , N is the label of the ith node passed by i i construction algorithm and section driving speed abnormal the vehicle, and T is the information interaction time when filtering algorithm are designed to ensure the integrity and the vehicle passes through node N . D is the start point of i 0 reliability of the sample data. +en, the speed feature vector the trajectory, and D is the end point of the trajectory. E Scientific Programming 3 Data Error. +e data record does not conform to the normal driving rules, including two ETC gantries that control different node1 node2 node3 driving directions recorded by the same vehicle at the same Q12 time, and different passing records of the same vehicle are recorded at the same time. +ese data need to be filtered or node4 node5 node6 deleted. Figure 1: Schematic of the sections. 3.1.2. Vehicle Speed Recognition Algorithm in Road Section. In order to calculate the speed distribution of the road section, it is necessary to obtain the transaction data of all vehicles of each Definition 4. +e average speed of a vehicle passing through gantry. However, gantry transaction data may be missing. a certain road section is called road section speed. +e +erefore, all traffic data and road network data need to be calculation method is shown in the following equation: checked and supplemented to ensure the integrity of the gantry transaction data. After the transaction data of the ETC gantry v � , (1) t − t system is initially cleaned, the trajectory Traj of each vehicle is 2 1 constructed in chronological order according to the transaction where s is the actual length of the road section, t is the time 1 data of each gantry. Traverse each adjacent ETC gantry when vehicles pass the start point of the road section, and t Node,Node in the Traj one by one. Check whether the i i+1 is the time when vehicles pass the end point of the road road section formed by the two gantries QD belong to the section. expressway road network G. If the road section QD belongs to the expressway road network G, the speed v of the vehicle passing through the section QD is directly generated. QD and j j Definition 5. +e dispersion of the speed of the road section the speed v are expressed as follows: describes the measures of dispersion of the average speed of vehicles passing through the road section. +e section speed QD � 􏽮⟨Node , Node ⟩, Distance 􏽯, j i i+1 j of vehicles on the expressway within a certain period of time constitutes the speed set of the section. Sort the value of i∈n (3) speed: the speed at 85th percentile is v , and the speed at 15th 1 v � 􏽘 v , percentile is v . +e speed dispersion index can be expressed QD ,T 2 j as where n represents the number of all vehicles within certain Δv � v − v . (2) 1 2 time period T of the road section QD and v represents the j i average speed of each vehicle on the road section QD within +e larger the value range is, the higher dispersions of certain time period. the speed information are. If QD does not belong to the expressway road network G, it means that the section data of the middle gantries are Definition 6. +e speed limit includes the minimum speed missing. And path searching algorithm based on Node , limit and the maximum speed limit. +e speed limit value is Node needs to be performed to fill the missing gantry i+1 generally an integer multiple of 10. In this paper, we only transaction data. As shown in Figure 2, if the road section discuss the maximum speed limit. formed by Node and Node cannot be queried in the road i i+1 network G, use Node and Node as the basic node. +e i i+1 feasible path Node , Node , Node , Node can be obtained i a b i+1 3. Methods through path search. Node and Node are supplementary a b nodes, and the average speed v between Node and Node is 3.1. ETC Data Preprocessing i i+1 taken as speed for⟨Node , Node ⟩, ⟨Node , Node ⟩, i a a b 3.1.1. ETC Data Cleaning. +e ETC gantry system can ⟨Node , Node ⟩. b i+1 generate a large amount of transaction data in a short period. To ensure the reliability of the average speed v, the Due to system error, information exchange interruption, minimum speed v is set for high-speed driving to 30 km/h min and severe weather conditions, these factors can lead to and the maximum speed v for high-speed driving to max abnormal data which can affect the results. In order to 160 km/h [19]. If the average speed value is not in the range reduce interference, the data needs to be preprocessed, vε[v , v ], where v is the average speed of all road min max mainly including the following aspects. sections between Node and Node , it will be deleted as i i+1 Data Redundancy: Duplication between Multiple Data. abnormal data. +e specific process of the section speed data +e transaction information of each vehicle passing through construction algorithm is shown in Algorithm 1. the ETC gantry should be unique. However, due to problems in data acquisition, transmission, storage process, and other intermediate links, it can cause the repeated data uploading 3.1.3. Outlier Information Detection Algorithm for Road and duplication, resulting in data redundancy. +erefore, Section. To better analyze the road section speed distribu- these data need to be cleaned. tion feature of each section, a noise data cleaning model is Q23 4 Scientific Programming the speed values of the 50th percentile, upper and lower 25th percentile, and the upper and lower 15th percentile of the Node i+1 speed set of the road section, and then converts it into Node Node i multidimensional feature vector α. It can be expressed as follows: Node α � α , α , . . . , α 􏼁 , (6) 1 2 6 where α ∼ α are, respectively, the 15th, 25th, 50th, 75th, Figure 2: Schematic diagram of driving path. 1 6 85th, and 95th percentile of the total section speed distri- bution, which can describe the overall distribution of the constructed to detect and eliminate outliers in the data. +e speed in road section. basic idea of the model is to use the upper and lower limits of the speed boxplot to detect abnormal points and determine the threshold interval for filtering abnormal speed data. 3.2.2. Road Section Speed Evaluation Feature. Road section Under the condition of collecting a large amount of ex- speed feature are described by the relevant evaluation in- dexes in frequency domain, including average speed, speed pressway ETC transaction data, according to the central limit theorem, the road section speed data set should be a standard deviation, and speed dispersion, which can transform into multidimensional feature vectors β. It is normal distribution. And the upper and lower limits of the speed boxplot that meet the 3σ interval range of the normal expressed as follows: distribution can better prove the rationality of realizing (7) β � β , β , β , β 􏼁 , 1 2 3 4 outlier detection and filtering through boxplot analysis. As shown in Figure 3, there are 6 element points in the boxplot, where β is the majority number of section speed, repre- among which q1 is 1/4 divide point; q2 is the median; q3 is senting the general level of vehicle speed statistical law; β the 3/4 divide point; and IQR � q3 − q1, which is the dis- and β are the overall average interval speed of the road tance between q1 and q3. +ere are also upper limit and section μ and standard deviation σ, respectively; and β lower limit. Here, q1 represents the speed value greater than attributes the speed dispersion indices, which reflects the 25% of the traffic flow, q2 represents the speed value greater changing range and dispersion range of speed data. than 50% of the traffic flow, and q3 represents the speed value greater than 75% of the traffic flow. +us, the upper and lower limits of the noise data cleaning threshold model 3.2.3. Road Section Speed Time Domain Feature. Road section speed time domain feature reflects the speed evo- can be obtained, expressed as follows: lution regularity of the traffic flow on different road sections Upper limit: q3 + 1.5 ∗ IQR, under different limited speed conditions. If the section speed (4) Lower limit: q1 − 1.5 ∗ IQR. data was analyzed by day without considering the feature of different periods, it was easily affected by road congestion +en, the threshold range of velocity filtering is obtained and other factors in individual periods, and it cannot reflect as follows: the speed evolution feature of the road. +erefore, it is necessary to fully integrate the speed feature information of v ∈ (Lower limit, Upper limit). (5) roads in different periods. +e whole day is divided into 24 time periods, denoted as 0, 1, ..., 23, respectively. +en, Among which, the speed data of the road section within mining and counting the speed information of each road the range of v is retained, and the outlier data is deleted. section in each period is carried out to find the speed change law of each road section. As shown in Figure 3, the mul- 3.2. Feature Vector Model of Expressway Speed. Vehicles tidimensional velocity time domain feature vector is con- driving on the expressway have different speeds at different structed. It is expressed as follows: times or on different road sections. +rough the statistical analysis of the feature of the traffic speed of the road section, c � c , c , . . . , c , (8) 1 2 n the potential connection between the speed of the vehicle where c ∼ c is the average road section speed of each and the road speed limit information can be obtained, after 1 n period in the data sample; that is, the average road section which the road section speed feature vector model is con- speed of 24 time periods in the whole day, in order from structed. +e feature vector is mainly divided into three large to small, takes the first n values. Here, we take the first 6 categories such that the first is the frequency-speed per- values to avoid the disturbance caused by the relatively low centile feature, the second is road section speed evaluation road section speed caused by traffic congestion or road feature, and the third is road section speed time domain maintenance in some periods. feature. 3.2.1. Road Section Frequency-Speed Percentile Feature. 3.3. Sample Imbalance Processing. +e road speed limit Road section frequency-speed percentile feature reflects the classification values constructed in this paper conform to the distribution of the section speed at different times, including 80 km/h, 100 km/h, 110 km/h, and 120 km/h specified in the Scientific Programming 5 Input: trajectory data of a car D, expressway road network data G Output: speed data of the road section (1) fuction Sections(D)//+e vehicle trajectory data is divided into the data of each section of the vehicle (2) D � {D , D , D }, D � {N , T } 1 2, . . . E i i i (3) for i � 0 to E-1 do (4) Node , Node ←D , D //Extracting the node information of two adjacent data points i i+1 i.N i+1.N (5) Time , Time ←D , D //Extracting the time information of two adjacent data points i i+1 i.T i+1.T (6) delta � Time -Time //Calculating the time difference between two adjacent data points i+1 i (7) R ←(Node , Node )//Reconstitute the front and back node information of the vehicle passing section i.Q i i+1 (8) R ←(Time , Time )//+e front and back time information of the vehicle passing section i.T i i+1 (9) R ←(road , road , delta) i i.Q i.T (10) Sec←{ R , R ,. . ., R }//Encapsulating into Sec 0 1 E-1 (11) end for (12) return Sec (13) end fuction (14) Sec � { R , R , R }←Sections (D) 1 2, . . . m (15) for each R in Sec(j � 0,1,2, . . ., m) do//Extracting road information from the data, which R � (Q , T , delta ) j j j j j (16) if Q in G then (17) Distance ←G . //Getting road section distance from expressway network, which k � Q k k Distance j (18) t � Sec //Extracting the time required for vehicles to pass through the road section j.delta (19) v � Distance /t//Speed of vehicle passing through road section (20) R ←v //Adding speed attribute j.V j (21) if Q not in G then//+e road information cannot be found in the expressway network, and there is uncollected node information between two nodes of the road section (22) {N , N , . . ., N }←shortest_path(G, N )//Searching the shortest path between two nodes, getting the path node data set, which 1 2 Z j Q �(N ,N ) j 1 Z (23) A � {A , A ,. . .,A }←{N ,N ,. . ., N }//Converting path node data set to road section data set 1 2 Z-1 1 2 Z (24) path � { } (25) for A in A then (26) path � {path , path , . . ., path }←G . //Getting road section distance from expressway network, and add to path, which 1 2 Z-1 k Distance k � A (27) end for (28) A. ←Sum(path) Distance (28) V ←A. /R . A Distance j delta (29) if V ≥ V and V ≤ V then A min A max (30) for A in A then (31) t ←(􏽐 path )/V //Calculating time difference i i A (32) t , t ←R . //Extracting the time passing through the two nodes separately 1 2 j T (33) if i � 1 then (34) A ←t //+e time when the vehicle enters the entrance A i.tq 1 i (35) A ←t +t //+e time when the vehicle leaves the entrance A i.th 1 i i (36) A ←t //Time difference i.delta i (37) else (38) A ←t +t //+e time when the vehicle enters the entrance A i.tq 1 i-1 i (39) A ←t +t //+e time when the vehicle leaves the entrance A i.th 1 i i (40) A ←t -t //Time of passing through the road section i.delta i i-1 (41) A ←(A , A ) i.T i.tq i.th (42) A ←V i.V A (43) end for (44) A←{Q,T,delta,V}//Getting the corrected section information, including road section node, time and road section speed attributes (45) R ←A//A replaces the original R , and to generate a new R j j j (46) end if (47) end if (48) end for (49) speed_data←{R , R , . . ., R }//Generating speed data of road section 0 1 c ALGORITHM 1: Algorithm of speed data construction in road section. 6 Scientific Programming IQR Q1 Q3 Lower limit Upper limit Median -4σ -3σ -2σ -1σ 0σ 1σ 2σ 3σ 4σ -2.698σ -0.6745σ 0.6745σ 2.698σ 24.65% 50% 24.65% -4σ -3σ -2σ -1σ 0σ 1σ 2σ 3σ 4σ Figure 3: Schematic diagram of noise data cleaning threshold model based on boxplot analysis. “Road Speed Limit Sign Design Specification” (JTG/T 3381- is iteratively calculated to determine the k-nearest 02-2020) and the “Expressway Engineering Technical neighbor sample points Standard” (JTG B01-2003). Because most of the data we Step 3. Perform random linear interpolation on the collect is 100 km/h, this means the data size of 100 km/h is connection line between sample points and the selected far more than the other three types of sample data, 80 km/h, s neighboring sample points to generate new samples 110 km/h, and 120 km/h. +is creates an imbalance among Step 4. Repeat Step 2 and Step 3 until the various sample categories. +erefore, to tackle the problem of un- categories of the expressway speed feature vector data balanced data samples, there are two processing methods, set reach a balance including oversampling and undersampling [20]. Over- sampling is to copy the minority samples multiple times to expand the data volume of the minority samples. +is 3.4. Maximum Speed Limit Recognition Classification Model. oversampling method will duplicate the preexisting sample +e acquisition of speed limit information on expressways is data, which will lead to a certain degree of overfitting during an important factor that affects the driving safety. Different the model training process. Undersampling is to randomly road sections correspond to different speed limit informa- remove part of the data from the majority samples or select a tion, and the differences of speed limit information directly part of the sample in this category according to a certain affect the state of the vehicles, which makes the relevant data proportion as the sample data. +is method will cause the show a certain pattern. Using strong learning machine to model to only learn a part of the rules of the sample data; perform in-depth learning and training on related data can thus, it cannot effectively reflect the complete pattern of the achieve high-precision recognition results. XGBoost is a sample in this category. In order to alleviate these problems, method of integrated learning based on a boosting algorithm an improved random oversampling method SOMTE [21] is [22]. Its learning machine usually takes the decision tree utilized, which analyzes the minority samples, by using their model and learns the true value and the residuals of the similarity in feature space to add the simulated new samples current prediction values of all trees through the continuous to the data set. +e number of minority samples in the iterative generation of new trees. +en, the results of all trees original data set is expanded, and the dispersion between are accumulated as the final result to obtain a better clas- categories is reduced; therefore, the imbalance problem is sification accuracy [23–25]. By using the XGBoost algorithm solved. +e process of the SOMTE can be divided into the as a classifier for identifying the maximum speed limit in- following steps: formation on expressways, the maximum speed limit in- formation can be determined accurately. Step 1. Select the speed feature vector set of minority sample categories with speed limit values of 80, 110, A sample data set is constructed by extracting 16-di- and 120 km/h mensional speed feature vectors from the expressway section Step 2. For each category of sample set, Euclidean data with the known speed limit information. Suppose the data set is S � (x distance is used as the metric in the feature space, and 􏼈 , y ), (x , y ), . . . , (x , y )􏼉. 1 1 2 2 m m then the distance between each sample in the sample set x (i � 1, 2, . . . , M) is the feature vector of the ith sample, i Scientific Programming 7 � � also known as the input value, that is, the constructed 16- 1 � � � � (11) Ω f 􏼁 � cT + λ�w �, k k k dimensional expressway speed feature vector. y (i � 1, 2, . . . , M) is the output value of the ith sample, that where c represents the penalty coefficient of the model, and is, the road speed limit classification labeled value corre- the value range is [0,1]. T represents the number of leaves of sponding to x . Assuming that the XGBoost integrated the kth tree; c is the regular term coefficient. learning model integrates a total of K regression trees, the +e XGBoost algorithm adopts an additive step-by-step prediction result of the XGBoost algorithm can be expressed integration strategy in the training process. First, optimize as in the following equation: the first tree, and then optimize the second tree until the kth tree is optimized, and the loss function is continuously y � 􏽘 f x􏼁 , f ∈ F, (9) i k i k reduced during the optimization process. By adding an k�1 incremental function f in the iterative process to optimize where K is the number of trees, f corresponds to the kth the objective function, the prediction accuracy can be im- proved, and the calculation method can be expressed as in regression tree with structure q and leaf weight w , F is an k k integrated classifier composed of all regression trees, the following equation: andf (x ) corresponds to the predicted score of the kth k i (t) (t− 1) regression tree on the sample x . Obj � 􏽘 l y , y 􏽢 + f x + Ω f + c, (12) 􏼐 􏼁 􏼑 􏼁 i i t i t i�1 +e objective function of XGBoost consists of a loss function and a regular term, expressed as follows: (t− 1) where c is a constant term and y 􏽢 represents the pre- dicted value in the (t − 1)th iteration on the ith sample. +en, Obj � 􏽘 l y , y 􏽢􏼁 + 􏽘 Ω f 􏼁 , (10) i i k carry out the expansion of the second-order Taylor equation i�1 k�1 and discard the constant term in order to reduce the running time of the model, expressed as follows: where l is the error function and Ω(f ) is the regularization term. +e regular term can be expressed as follows: (t) (t− 1) 2 Obj � 􏽘􏼔l􏼐y , y 􏼑 + g f x􏼁 + h f x􏼁 􏼕 + Ω f􏼁 i i i t i i t i t i�1 (13) ⎢ 2 ⎥ ⎡ ⎢ ⎤ ⎥ ⎢⎛ ⎜ ⎞ ⎟ ⎛ ⎜ ⎞ ⎟⎥ ⎢⎝ ⎠ ⎝ ⎠⎥ ⎣ ⎦ � 􏽘 􏽘 g w + 􏽘 h + λw + cT, i j i j j�1 i∈I i∈I j j where I � i|q(x ) � j represents the sample set of leaf j cross-validation in each iteration to get the ideal 􏼈 􏼉 j i and g and h are the first derivative and the second de- number of decision trees i i rivative of the loss function, respectively. Step 2. According to Step 1, the learning rate and the +e objective function is converted into a quadratic number of decision trees are determined, and the K − (t) function Obj about w to find the minimum value, and then j fold cross-validation method and grid search method the optimal prediction score of each leaf node and the optimal are used to optimize the parameters of each boosting value of the objective function are obtained as follows: machine j Step 3. +e method is the same as Step 2; based on the w � − , given data, adjust the regularization parameters to H + λ reduce overfitting (14) G Step 4. Appropriately reduce the learning rate to de- (t) Obj � − 􏽘 + cT, 􏼐 􏼑 termine the final ideal parameter combination of the 2 H + λ j�1 model where G � g , H � h . 􏽐 􏽐 j i∈I i j i∈I i j j After that, the optimization of XGBoost parameters 3.5. Maximum Speed Limit Recognition Model. +e problem mainly include the following 4 steps: of identifying the maximum speed limit information on Step 1. Choose a higher learning rate, set a reasonable expressways is a classification problem. +e framework of initial value of the booster parameters, and use K-fold identification model is shown in Figure 4. Dynamic 8 Scientific Programming Process 1 Import highway ETC data (including ETC transaction data, road network Noise filtering in ETC data, Start topology data, road speed limit eliminating abnormal data. Start training based on information data, etc.) XGBoost Preset the number of iterations and define the loss function Analyze the speed feature of the road Eliminate outlier Use the processed ETC section and construct the expressway velocity samples based data to construct a road road section speed feature vector on the boxplot section speed data set Calculate the first and second model derivatives Add a tree and update the loss function Expressway speed limit Oversampling based on stratified samples divided information recognition SMOTE to balanced data into training samples and model XGBoost set test samples algorithm (in process 1) Meet the iteration requirements? Expressway Recognition model parameter optimization speed limit End based on grid search and K_fold cross- information validation method (in process 2) End recognition Process 2 Initialize Determine the Cross-validation Start Optimal XGBoost parameter search with accuracy as optimizing combination End algorithm range and perform a the evaluation parameters of parameters parameters parameter grid search index Figure 4: +e flowchart of expressway speed limit information recognition model. identification of highway speed limit information is realized supervision and record, vehicle path identification, toll data based on the following steps. First, the data cleaning is fitting, and other functions [14]. +e experimental data adopted on ETC gantries transaction data, removing dupli- mainly includes three categories. One is the ETC transaction cated data and error data. Taking vehicle speed recognition, data collected by the ETC gantry on various sections of the the algorithm is used to find the missing records in the ETC expressway in Fujian Province for 9 days from September 3 gantries transaction data and to accurately reduce of gantry to September 11, 2020; it contains 50 expressways including distribution on expressways. +e speed of the road section can Fuyin Expressway, Xiazhang Expressway, and Longchang be obtained by calculating the speed of the vehicle between the Expressway, which contains 534 sections, about 100 million gantries. However, there are some very large or small outliers pieces of data. +e average distance between each section is in the speed of the road section so that boxplot is utilized to 8.9 km, 85% of the section distance are less than 16 km, and remove speed outliers. Next, the speed of each driving section the maximum distance is 30 km; its distribution is shown in is analyzed, and the models of frequency-speed percentile Figure 5. +ese data are sourced from Fujian Provincial feature, interval speed evaluation feature, and interval speed Expressway Information Technology Co., Ltd. +e main time domain feature are constructed. Since the velocity dis- attributes of the data are shown in Table 1. +e second tributions of various types in the data are quite different, the category is the road speed limit information data, including oversampling algorithm is used to expand the minority the name of the road section and the maximum speed limit samples to obtain the balanced data. Finally, data are divided value of the road section, which is derived from the online into training data and test data. +e training data are inputted announcement of the Fujian traffic police. It is used for into XGBoost algorithm for training and learning; the model learning, training, and testing; the third category is training process is shown in process 1 in Figure 4. At the same the distance of each section of the expressway from the time, the grid search and cross-validation are used to find the Amap, including the node pair of the gantry of each section optimal parameters of each boosting machine in XGBoost; the and the actual road section distance. optimization process is shown in process 2 in Figure 4. 4.2. Experimental Results and Analysis 4. Experiments and Results 4.2.1. ETC Data Preprocessing. Matching the initially cleaned 4.1. Introduction of Experimental Data. ETC gantry system is ETC data with the road network topology data, the road section one of the main components of the Expressway ETC System, speed of each vehicle is calculated, and then the expressway which is used for real-time vehicle driving information road section speed data set is constructed. Table 2 shows the Scientific Programming 9 Map of Fujian Province of China 1:4,930,986 Figure 5: Distribution of expressway gantries in Fujian province. Table 1: ETC shelf system transaction data attribute table. Attribute name Examples Attribute name Examples ∗∗∗ Trade ID 340 98 OBU plate Blue Fujian A12345 Trade time 2020/9/5 21 : 29 : 26 Vehicle class 1 ∗∗ Flag ID 35 15 Enter time 2020/9/5 21 : 29 : 26 Flag type 0 Enter station 25 7 ∗∗∗ Flag index 1 OBU ID 12B E7 main characteristics of the data. Due to the influences of some accounting for 9.68%, and the preprocessed section speed data random factors, there may be a certain amount of outlier data; is approximately 11.1 million. these outlier values of each road section can be detected through the noise data filtering model. After the noise data is 4.2.2. Road Section Velocity Feature Vector. After obtaining eliminated, the road section velocity data after preprocessing is obtained. As shown in Figure 6, the road section speed data of the preprocessed speed data set of the road section, the road section speed feature vector model is constructed based on the the road section from September 3, 2020, to September 11, 2020, is used. Among them, the abscissa denotes the date of statistical analysis of the expressway road section speed feature by day. +us, the expressway road section data set contains 3 each day, and the ordinate represents the magnitude of the road section speed. In addition, each box represents the overall types, including 16-dimensional feature vector, and its sample classification mark value is obtained. +e attributes shown in distribution of the road section speed of the road section on that day, and the black origin represents the part need to be Tables 3–5 are the feature vectors, and output of the model after the speed data feature is extracted. Among them, Q D is a road deleted. +e original speed data of the road section are around section; for example, QD 1.229 million, the abnormal data are about 1.19 million, represents the road 340507−351C03 10 Scientific Programming Table 2: Expressway road section speed data attribute table. Attribute name Examples Attribute name Examples OBUPLATE Blue Fujian A12345 Time delta 439.0 s Before trade time 2020-09-07 11 : 07 : 35 Speed 87.15 km/h ∗∗ Before flag ID 34 05 Enter time 2020-09-06 21 : 24 : 20 ∗∗ After trade time 2020-09-07 11 :14 : 54 Enter station 330 11 ∗∗ After flag ID 34 07 Road distance 10628 m date Figure 6: Velocity information distribution boxplot. Table 3: Frequency-speed percentile feature (unit: km/h). Q D Date α α α α α α l 1 2 3 4 5 6 QD 2020-9-3 70 73 79 92 103 109 110 340507−351C03 QD 2020-9-4 70 73 79 95 102 110 110 340507−351C03 QD 2020-9-8 51 59 80 92 95 103 100 34012B−34012 D QD 2020-9-9 51 58 82 94 94 104 100 34012B−34012 D QD 2020-9-7 67 71 78 83 87 92 80 350703−350701 QD 2020-9-8 70 75 82 89 92 96 80 350703−350701 QD 2020-9-3 88 95 106 114 117 123 120 341801−341801 QD 2020-9-4 91 97 107 114 117 123 120 341801−341801 Table 4: Road section speed evaluation feature (unit:km/h). Q D Date β β β β l 1 2 3 4 QD 2020-9-3 76 83 13 32 110 340507−351C03 QD 2020-9-4 76 83 14 32 110 340507−351C03 QD 2020-9-8 95 75 19 44 100 34012B−34012 D QD 2020-9-9 95 76 20 46 100 34012B−34012 D QD 2020-9-7 82 77 9 20 80 350703−350701 QD 2020-9-8 82 81 10 22 80 350703−350701 QD 2020-9-3 114 103 13 29 120 341801−341801 QD 2020-9-4 111 104 12 26 120 341801−341801 speed (km/h) 2020-09-03 2020-09-04 2020-09-05 2020-09-06 2020-09-07 2020-09-08 2020-09-09 2020-09-10 2020-09-11 Scientific Programming 11 Table 5: Road section speed time domain feature (unit:km/h). Q D Date c c c c c c l 1 2 3 4 5 6 QD 2020-9-3 88 87 87 87 8 6 86 110 340507−351C03 QD 2020-9-4 91 89 89 89 88 86 110 340507−351C03 QD 2020-9-8 81 80 80 8 80 80 100 34012B−34012 D QD 2020-9-9 85 83 82 82 81 81 100 34012B−34012 D QD 2020-9-7 82 81 79 79 79 78 80 350703−350701 QD 2020-9-8 84 84 83 83 82 82 80 350703−350701 QD 2020-9-3 106 106 105 105 105 105 120 341801−341801 QD 2020-9-4 107 106 105 105 105 105 120 341801−341801 Table 6: Optimal combination of model parameters. Parameter Search scope Step length Optimal value n_estimators [100,1000] 100 700 learn_rate [0,0.5] 0.01 0.07 max_depth [1,15] 1 8 min_child_weight [1,9] 1 1 section between ETC gantry 340507 to ETC gantry 351C03. +e model can be established through the above pro- Date represents the date when the traffic condition occurred, cessing, using test data to verify the effectiveness of the model, and α − α represent that each section is between 15% and and the results of the confusion matrix are shown in Table 7. 1 6 95% of driving speed, where β − β represents the mode, In 3295 test samples, 3212 were identified correctly, with an 1 4 average, standard deviation, and dispersion of vehicle speed, accuracy rate of 97.5%. +e recognition accuracy of 80 km/h data is 100%. +is is because the data with a speed limit of c − c represent the first 6 values after sorting the average road 1 6 speed in 24 time periods of the day, andl represents the 80 km/h is quite different from other categories and can be better distinguished. However, the gap between the category maximum speed limit value. data with100 km/h and110 km/h is very small, and it is easy to cause mistakes in identification. Among them, there are 824 4.2.3. Balance Analysis of Sample Data. +ere are 5,081 sample data with a speed limit of 100 km/h, 759 correctly samples in road section speed feature vector data set, identified, and 47 with a speed limit of 110 km/h, which makes among which the number of samples with 80 km/h, the accuracy rate decrease to some extent. For the same 100 km/h, 110 km/h, and 120 km/h speed limits accounts reason, the accuracy rate of the 110 km/h limit is also lower for 5.31%, 87.24%, 9.39%, and 2.83%, respectively, which position compared with the other three categories. are seriously unbalanced among different categories and have adverse effects on the efficiency of model identifica- tion. +erefore, the SMOTE is used to oversample the 4.2.5. Comparison and Analysis sample data with speed limits of 80, 100, and 120 km/h, which makes it possible to achieve relative balance among (1) Impact Analysis of Data Equalization. In order to verify all kinds of samples. In the experiment, the new data the influence of oversampling model on SMOTE algorithm, obtained by the SMOTE algorithm is used as the input of the original data set and the data set processed by SMOTE the algorithm model. +e sample data consists of training algorithm are used for training and learning. +e other steps sample data and testing sample data. of the model are consistent, and two model classifiers are obtained. +e comparison of classification results is shown in Table 8. +e first category is the model result corre- 4.2.4. Ae Result of the Model’s Performance. +e parameter sponding to the data set processed by the SMOTE algorithm, setting of XGBoost algorithm is an important factor that affects and the second category is the model result corresponding to the performance of the model. In order to improve the accuracy the original data set. +e following can be seen from Table 8: of the model, a set of sensitivity experiments is conducted to optimize the performance of the model. First, four boosting (1) After the SMOTE algorithm oversampled the data, machine parameters are identified that have a significant impact the accuracy, recall rate, and F1-score of all cate- on the model, including n_estimators, learn_rate, max_depth, gories were greatly improved. and min_child_weight. Second, a combination of grid search and K-fold cross-validation (GK) are used to obtain the optimal (2) +e data with the speed limit value of 100 km/has the most samples. Without data expansion in the parameters, in which K � 5 for cross-validation. Follow the method of Section 3.4 for parameter optimization. +e search oversampling process, the evaluation indexes of this class are still improved, indicating that the SMOTE range, step length, and postexperiment parameter optimizations for each parameter are shown in Table 6. algorithm can not only greatly improve the 12 Scientific Programming Table 7: Confusion matrix. Real class Speed-limiting class (km/h) 80 100 110 120 Accuracy rate (%) 80 824 0 0 0 100 100 6 759 47 12 92.1 110 0 12 807 5 97.9 120 0 1 0 822 99.9 Accuracy rate 99.3% 98.3% 94.5% 98.0% 97.5 Forecast result Table 8: Effect comparison before and after data oversampling. Category Speed limit category (km/h) Precision Recall F1-score After oversampling 80 1.00 1.00 1.00 Before oversampling 80 1.00 0.51 0.67 After oversampling 100 0.98 0.92 0.95 Before oversampling 100 0.91 0.98 0.94 After oversampling 110 0.94 0.98 0.96 Before oversampling 110 0.73 0.46 0.57 After oversampling 120 0.98 1.00 0.99 Before oversampling 120 0.80 0.32 0.46 After oversampling Avg/total 0.98 0.97 0.97 Before oversampling Avg/total 0.89 0.90 0.89 recognition accuracy of minority speed limit infor- velocity time domain feature. All the features are taken into mation, but also effectively improve the recognition account, and the experimental results are compared. +e accuracy of majority speed limit information. experimental results are shown in Figure 7, where A1–A7 represent models A , A , A , A , A , A , and A , α β c α,β α,c β,c α,β,c (3) +e SMOTE algorithm improves the prediction accu- respectively. +e following can be seen: racy of data with a speed limit of 110 km/h and 120 km/ (1) When only a single feature is added, a better model h, and the recall rate and F1-score are also greatly prediction effect can be obtained by adding fre- improved. It has little effect on the prediction accuracy quency-velocity percentile feature, followed by in- of class data with a speed limit of 80 km/h but has a terval velocity evaluation feature model and interval great influence on the recall rate and F1-score. velocity time domain feature model. (2) Comparison and Analysis of Feature Vector Model. By only (2) When two features are added, the prediction effect is adjusting input features, the other steps remain the same; the improved compared to a single feature. When all the effectiveness of different types of features in expressway features are added, the prediction effect is the best. section speed feature vector model can be verified. Seven sets (3) +e contribution of each feature in the speed feature of experiments are set up to verify the influence of a single- vector model of the expressway section to the pre- feature and multiple-feature combinations on the model. diction model is arranged from large to small, which is Model A indicates that only frequency-velocity percentile the road section speed-frequency percentile feature, feature is considered. Model A only considers the road road section speed time domain feature, and road section velocity evaluation feature. Model A only considers section speed evaluation feature; the contribution of time domain feature of road section velocity. Model A α,β the feature vector in each feature is shown in Figure 8. indicates that frequency-velocity percentile feature and road section velocity evaluation feature are considered. Model A α,c takes into account the frequency-velocity percentile feature (3) Comparison of Classification Models. To further illus- and road section velocity time domain feature. Model A trate the advantages of the model, we compare the per- β,c takes into account the road section velocity evaluation feature formance of GBDT, KNN, SVM, AdaBoost, and Logistic and road section velocity time domain feature. Model A Regression (LR) with our method. +e experimental results α,β,c takes into account the frequency-velocity percentile feature, are shown in Table 9. From the comparison of six different road section velocity evaluation feature, and road section classification methods in Table 7, SVM, AdaBoost, and LR Scientific Programming 13 0.975 0.964 0.965 0.961 0.934 0.936 0.842 0.8 0.6 0.4 0.2 A1 A2 A3 A4 A5 A6 A7 Model groups Figure 7: Model accuracy comparison. γ6 0.0329 γ5 0.0310 γ4 0.0332 γ3 0.0401 γ2 0.0350 γ1 0.0723 β4 0.0404 β3 0.0497 β2 0.0643 β1 0.0818 α6 0.1617 α5 0.0864 α4 0.0553 α3 0.0371 α2 0.1053 α1 0.0735 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Contribution rate Figure 8: Feature contribution. Table 9: Model comparison results. Model Testing samples Prediction of correct samples Accuracy (%) Precision Recall rate F1-score GC-XGBoost 3295 3212 97.5 0.98 0.97 0.97 GBDT 3295 2908 88.3 0.88 0.88 0.88 KNN 3295 3079 93.4 0.94 0.93 0.93 SVM 3295 2374 72.0 0.70 0.72 0.70 AdaBoost 3295 1911 58.0 0.61 0.58 0.51 LR 3295 1684 51.1 0.48 0.51 0.49 proposed. +e speed data of the road section is constructed, classifiers perform poorly in terms of the accuracy, recall rate, and F1-score. GC-XGBoost, GBDT, and KNN can get and the outlier samples in each road section are eliminated by an ideal result on the expressway maximum speed limit the boxplot analysis to ensure the accuracy of the ETC data information recognition, and the recognition accuracy is expression. +en, the SMOTE algorithm is used to over- high. In particular, GC-XGBoost outperforms GBDT and sample the samples of the minority speed limit categories to KNN in terms of the quality of results, with the highest achieve the balance between the various types of road section accuracy rate of 97.5%. speed limit information. Finally, the oversampled training samples are input into the proposed GC-XGBoost (grid search + cross-validation + XGBoost) algorithm for training 5. Conclusion and learning; then it is compared and analyzed with multiple similar algorithms. +e experimental results show the +is paper proposes a method of identifying expressway following: speed limit information based on ETC data mining analysis. First, the abnormal data of ETC gantry is processed, and a (1) +e contribution of each feature in the speed road section speed data set construction algorithm is feature vector model of expressway section to the The feature category Interval speed Frequency-speed evaluation Interval speed percentile characteri time domain characteristic stic characteristic Accuracy rate 14 Scientific Programming prediction model is arranged from large to small, Hundred, +ousand and Ten +ousand Talent of Fujian followed by the speed-frequency percentage fea- (GY-Z19113). ture, time domain feature, and speed evaluation feature. +ree categories of features have an im- References provement effect on the prediction model, and the frequency-speed percentile feature has the best [1] “Ministry of Transport of the People’s Republic of China. improvement effect. Today, announce the change process of our country’s ex- pressways[EB/OL].(2021-03-22),” https://mp.weixin.qq.com/ (2) In the test sample data, the speed limits of 80 km/h, s?__biz�MzI3MDQwMDQ5NQ��&mid�2247537632&idx� 1&sn�8c806399c88108c7bae2c3dd00f56e30&scene�0. 100 km/h, 110 km/h, and 120 km/h classification [2] Z. Yao, H. Jiang, Y. Cheng, Y. Jiang, and B. Ran, “Integrated data recognition accuracy are 100%, 92.1%, 97.9%, schedule and trajectory optimization for connected auto- and 99.9%; the overall accuracy is 97.5%. +e gap mated vehicles in a conflict zone,” IEEE Transactions on between the category data with 100 km/h and Intelligent Transportation Systems, pp. 1–11, 2020. 110 km/h is very small, so the recognition accuracy is [3] M. H. Hosseinlou, S. A. Kheyrabadi, and A. Zolfaghari, relatively low. “Determining optimal speed limits in traffic networks,” IATSS Research, vol. 39, no. 1, pp. 36–41, 2015. (3) +e speed limit recognition accuracy of GC- [4] L. Aarts and I. Van Schagen, “Driving speed and the risk of XGBoost is 97.5%, precision is 0.98, recall is 0.97, and road crashes: a review,” Accident Analysis & Prevention, F1-score is 0.97. +e experimental results are sig- vol. 38, no. 2, pp. 215–224, 2006. nificantly better than those of the other five algo- [5] G. Sugiyanto and S. Malkhamah, “Determining the maximum rithms, which can accurately identify the maximum speed limit in urban road to increase traffic safety,” Jurnal speed limit information of expressway. Teknologi, vol. 80, no. 5, 2018. [6] B. Khondaker and L. Kattan, “Variable speed limit: an +is paper considers the speed feature of hybrid vehicles, overview,” Transportation Letters, vol. 7, no. 5, pp. 264–278, which is suitable for the identification of the maximum speed limit information of expressway. However, this work [7] A. Van Benthem, “What is the optimal speed limit on free- still has some limitations: ways?” Journal of Public Economics, vol. 124, pp. 44–62, 2015. [8] Y. Zhang and P. A. Ioannou, “Combined variable speed limit (1) +e speed limit recognition of 100 km/h and and lane change control for highway traffic,” IEEE Transac- 110 km/h is less effective. More speed limit features tions on Intelligent Transportation Systems, vol. 18, no. 7, can be considered to explore the differences be- pp. 1812–1823, 2016. tween the two to improve their speed limit recog- [9] J. Cao, C. Song, and S. Peng, “Improved traffic sign detection nition effect. and recognition algorithm for intelligent vehicles,” Sensors, vol. 19, no. 18, p. 4021, 2019. (2) In this study, we do not consider the speed limit [10] S. K. Berkaya, H. Gunduz, O. Ozsen, and G Serkan, “On values of different lanes on the same road. In the circular traffic sign detection and recognition,” Expert Systems future, they can be considered to analyze the speed with Applications, vol. 48, pp. 67–75, 2016. limit information on different lanes of the same [11] D. Tabernik and D. Skoˇcaj, “Deep learning for large-scale road through vehicle classification and road lane traffic-sign detection and recognition,” IEEE Transactions on number and construct a more complete express- Intelligent Transportation Systems, vol. 21, no. 4, pp. 1427– way speed limit information recognition model. 1440, 2019. [12] M. Liang, X. Cui, and Q. Song, “Traffic sign recognition Data Availability method based on HOG-Gabor feature fusion and Softmax classifier,” Journal of Traffic and Transportation Engineering, +e data used to support the findings of this study are vol. 17, no. 03, pp. 151–158, 2017. [13] C. Jiang and X. Xue, “A uniform compact genetic algorithm currently under embargo while the research findings are for matching bibliographic ontologies,” Applied Intelligence, commercialized. Requests for data, 12 months after publi- vol. 51, pp. 7517–7532, 2021. cation of this article, will be considered by the corresponding [14] F. Zaklouta and B. Stanciulescu, “Real-time traffic sign rec- author. ognition in three stages,” Robotics and Autonomous Systems, vol. 62, no. 1, pp. 16–24, 2014. Conflicts of Interest [15] S. Aziz and F. Youssef, “Traffic sign recognition based on multi-feature fusion and ELM classifier,” Procedia Computer +e authors declare that they have no conflicts of interest. Science, vol. 127, pp. 146–153, 2018. [16] H. Luo, Y. Yang, B. Tong, W. Fuchao, and F. Bin, “Traffic sign Acknowledgments recognition using a multi-task convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, +is work was funded by the National Natural Science vol. 19, no. 4, pp. 1100–1111, 2017. Foundation of China (41971340), the Special Funds for the [17] A. Pascale, F. Deflorio, M. Nicoli, B. Dalla Chiara, and Central Government to Guide Local Scientific and Tech- M. Pedroli, “Motorway speed pattern identification from nological Development (2020L3014), the 2020 Fujian floating vehicle data for freight applications,” Transportation Province “the Belt and Road” Technology Innovation Research Part C: Emerging Technologies, vol. 51, pp. 104–119, Platform (2020D002), and the Provincial Candidates for the 2015. Scientific Programming 15 [18] L. Liao, X. Jiang, M. Lin, and F. M Zou, “Recognition method of road speed limit information based on data mining of traffic trajectory,” Journal of Traffic and Transportation Engineering, vol. 15, no. 5, pp. 118–126, 2015. [19] J. Yang, J. Xu, C. Gao, B. Guohua, X. Linfang, and L. Menghui, “Modeling of the relationship between speed limit and characteristic speed of expressway traffic flow,” Sustainability, vol. 11, no. 17, p. 4621, 2019. [20] R. C. Prati, G. E. Batista, and M. C. Monard, “A study with class imbalance and random sampling for a decision tree learning system,” in Proceedings of the IFIP International Conference on Artificial Intelligence in Aeory and Practice, pp. 131–140, Springer, Boston, MA, July-2008. [21] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sam- pling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. [22] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, University Press, New York, NY, USA, August [23] T. Chen, T. He, M. Benesty et al., “Xgboost: extreme gradient boosting,” XGBoost contributors [cph] (base XGBoost imple- mentation, vol. 1, no. 4, 2015. [24] A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. Mohammadian, “Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis,” Accident Analysis & Prevention, vol. 136, Article ID 105405, 2020. [25] X. Shi, Y. D. Wong, M. Z. F. Li, C. Palanisamy, and C. Chai, “A feature learning approach based on XGBoost for driving assessment and risk prediction,” Accident Analysis & Pre- vention, vol. 129, pp. 170–179, 2019. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Scientific Programming Hindawi Publishing Corporation

The Method of Dynamic Identification of the Maximum Speed Limit of Expressway Based on Electronic Toll Collection Data

Loading next page...
 
/lp/hindawi-publishing-corporation/the-method-of-dynamic-identification-of-the-maximum-speed-limit-of-YMLPGMdS4t

References (26)

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2021 Fumin Zou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1058-9244
eISSN
1875-919X
DOI
10.1155/2021/4702669
Publisher site
See Article on Publisher Site

Abstract

Hindawi Scientific Programming Volume 2021, Article ID 4702669, 15 pages https://doi.org/10.1155/2021/4702669 Research Article The Method of Dynamic Identification of the Maximum Speed Limit of Expressway Based on Electronic Toll Collection Data 1 1 2 2 2 3 Fumin Zou, Feng Guo, Junshan Tian, Sijie Luo , Xiang Yu, Qing Gu, and Lyuchao Liao College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, Fujian, China Fujian Key Lab for Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou 350118, Fujian, China Fujian Provincial Expressway Information Technology Co., Ltd., Fuzhou 350011, Fujian, China Fujian Provincial Big Data Research Institute of Intelligent Transportation, Fujian University of Technology, Fuzhou 350118, Fujian, China Correspondence should be addressed to Sijie Luo; sjluo@fjut.edu.cn Received 5 August 2021; Revised 16 October 2021; Accepted 2 November 2021; Published 18 November 2021 Academic Editor: Zhu Xiao Copyright © 2021 Fumin Zou et al. +is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To overcome the drawbacks of the maximum speed limit information of expressways (i.e., long update cycle and great complexity of information recognition), in this work, an Electronic Toll Collection (ETC) gantry data-based method for dynamically identifying the maximum speed limit information of expressways is proposed. Firstly, the characteristics of the ETC gantry data are analyzed, and then data are cleaned and reconstructed, after which an algorithm is proposed for constructing a vehicle travel speed data set. Secondly, the speed feature vector model of the road section is established by taking the relationship among the speed distribution feature, time domain feature, and the maximum speed limit of the road section into consideration. +en, a data supplement algorithm is constructed to solve the problem of the imbalance of data samples. Finally, the combined GC-XGBoost classification algorithm is used to train and learn the potential speed limit features, and it is verified through the Fujian Provincial Expressway ETC data and the speed limit information provided by the Fujian Traffic Police. +e result shows that the accuracy of the method in the recognition of the maximum limited speed information of the expressway is 97.5%. Compared with the traditional limited speed information recognition and extraction methods, the proposed approach can identify the maximum limited speed information of each section of the expressway more efficiently. It can also accurately identify the dynamic change of the maximum limited speed information, which is able to provide data support for intelligent expressway management systems and map providers. support for the informatization construction, vehicle in- 1. Introduction frastructure cooperation, and automatic driving [2] of smart In recent years, China’s expressway ETC system technology expressway. Obtaining the maximum speed limit informa- has been developed rapidly. More and more vehicles have tion of each section of the expressway is an important part of installed ETC equipment. +ese vehicles interact with ETC intelligent management of expressways [3]; it can provide drivers with expressway speed limit information [4,5] to gantries during driving, resulting in massive ETC data. At present, the cumulative users of ETC have exceeded 220 avoid traffic accidents caused by speeding and provide re- million, and the utilization rate of vehicle owners is 78% [1]. liable perception and driving speed decision-making for Moreover, the ETC gantry can also interact with the Manual autonomous vehicles. However, the maximum speed limit Toll Collection (MTC) system users. +erefore, the ETC information is dynamic and changeable. +e relevant gantry system almost collects the traffic information of all management departments will adjust the speed limit in- vehicles on the expressway, reflecting the overall traffic formation of the road section according to road traffic flow, situation of the expressway, which can provide strong road maintenance conditions, and the number of traffic 2 Scientific Programming model of the speed limit feature is constructed to mine the accidents [6–8]. At present, the method of collecting speed limit identification information is mainly manually col- speed limit feature of the vehicle speed in different aspects. Finally, taking the road maximum speed limit information lected, then the data is uploaded to the system for updating within a certain period. However, this method has two of 534 sections of expressways in Fujian Province as the disadvantages: first, it requires professionals to travel to the sample set. +en, the multivoting ensemble algorithm is expressway and collect speed limit information, which costs used to perform supervised classification training and cross- immense manpower and material resources. Second, it has a validation on the road speed feature. +e test results show long update cycle, and the driver cannot obtain the latest that this method can well identify the maximum speed limit speed limit information, which leads to safety hazards while information and recognize the dynamic changes of the maximum speed limit information on the road. driving, and the traffic efficiency of the road is corre- spondingly reduced. +erefore, the study of how to auto- +e contributions of this paper can be summarized as follows. First, an algorithm is proposed for constructing matically collect the speed limit information and dynamically identify the maximum speed limit information speed sets of road section, which can solve the problem that the speed of road section cannot be calculated due to the lack on the road in real-time has research significance. Traffic flow prediction and travel time prediction are of transaction records of ETC gantries and obtain the speeds research hotspots in the field of transportation. Most of their of vehicles on each road section accurately and completely. research methods and speed limit recognition are supervised Second, this proposal extracts the feature of the road section learning based on machine learning algorithms. +e differ- speed from different aspects to construct the road section ence is that speed limit recognition is a classification problem, speed feature vector model and mine the potential corre- and traffic flow prediction and travel time prediction are lation features between the speed of the vehicles on the expressway and the road speed limit information. +ird, a regression problems. +e recognition of road maximum speed limit information mainly relies on image recognition dynamic recognition method of the maximum speed limit of expressways is proposed to identify the maximum speed technology [9–12] and floating car trajectory data mining technology. +e image recognition technology obtains the limit of the expressway, the validity of the method is verified by the real maximum speed limit information, and the speed limit information of each road by recognizing the speed limit information of the traffic signs on the road. Machine scientificity is verified by comparing a large number of learning is widely used in a variety of research fields [13]. prediction algorithms. Support Vector Machine (SVM) [14], Extreme Learning +is paper is organized as follows. Section 1 introduces Machine (ELM) [15], and multitask convolutional neural the research methods of road speed limit recognition. network (MTCNN) [16] are used to train and learn speed Section 2 defines the related concepts in this work. Section 3 limit signs features to realize the recognition of maximum describes each part of the dynamic method of expressway maximum speed limit. Section 4 shows the experimental road speed limit. Although these methods are relatively suitable in terms of recognition effect, they require surveyors results and analysis. Section 5 draws the conclusion and future work. to collect pictures of speed limit signs on the road, which consumes a lot of resources. In addition, the collection period is long and cannot achieve real-time and dynamic recognition 2. Relevant Definitions maximum speed limit information. In terms of floating car trajectory data mining, the floating car is equipped with a Definition 1. Each ETC gantry of the expressway is col- global positioning system, which records the time, location, lectively called Node, and two adjacent Nodes on the road and other information of the vehicle, and the floating car constituting an expressway section, which is referred to as trajectory data mining can obtain the driving speed feature of QD � {Q, Distance}, Q � < Node1, Node2 >, Node and Q, all floating car on the road [17]. Machine learning algorithm are shown in Figure 1, where Node1 is the start point of the [18] is able to learn the maximum speed limit feature in the road section, Node2 is the end point of the road section, and vehicle speed information of the road to realize the recog- Distance is the actual distance of the road section. nition of the maximum speed limit information. However, the floating car accounts for a small proportion of all cars that Definition 2. Expressway network, formed by all the ex- cannot fully reflect the speed of the vehicles on the ex- pressway sections within this proposal, referred to as pressway. +erefore, the maximum speed limit recognition G � {QD1, . . . , QDn,}. based on floating car data still has certain defects. In view of the high cost of speed limit sign recognition and the shortcomings of trajectory data recognition, this Definition 3. A set of ETC gantries by which a vehicle passed study proposes a method using real-time traffic data col- while driving on the expressway, forming a sequence of lected by an ETC gantry system to identify the maximum nodes in chronological order called trajectory speed limit of expressways dynamically, which solves the Traj � 􏼈D , D , Di, . . . , Dj, D 􏼉, D � (N , T ), 0 ≤ i ≤ E, 0 1 E i i i problems of the high cost of manual information collection ∀i ≤ j, T ≤ T . D is the trajectory point, including node N i j i i and incomplete vehicle data. First, the road section speed set and time property T , N is the label of the ith node passed by i i construction algorithm and section driving speed abnormal the vehicle, and T is the information interaction time when filtering algorithm are designed to ensure the integrity and the vehicle passes through node N . D is the start point of i 0 reliability of the sample data. +en, the speed feature vector the trajectory, and D is the end point of the trajectory. E Scientific Programming 3 Data Error. +e data record does not conform to the normal driving rules, including two ETC gantries that control different node1 node2 node3 driving directions recorded by the same vehicle at the same Q12 time, and different passing records of the same vehicle are recorded at the same time. +ese data need to be filtered or node4 node5 node6 deleted. Figure 1: Schematic of the sections. 3.1.2. Vehicle Speed Recognition Algorithm in Road Section. In order to calculate the speed distribution of the road section, it is necessary to obtain the transaction data of all vehicles of each Definition 4. +e average speed of a vehicle passing through gantry. However, gantry transaction data may be missing. a certain road section is called road section speed. +e +erefore, all traffic data and road network data need to be calculation method is shown in the following equation: checked and supplemented to ensure the integrity of the gantry transaction data. After the transaction data of the ETC gantry v � , (1) t − t system is initially cleaned, the trajectory Traj of each vehicle is 2 1 constructed in chronological order according to the transaction where s is the actual length of the road section, t is the time 1 data of each gantry. Traverse each adjacent ETC gantry when vehicles pass the start point of the road section, and t Node,Node in the Traj one by one. Check whether the i i+1 is the time when vehicles pass the end point of the road road section formed by the two gantries QD belong to the section. expressway road network G. If the road section QD belongs to the expressway road network G, the speed v of the vehicle passing through the section QD is directly generated. QD and j j Definition 5. +e dispersion of the speed of the road section the speed v are expressed as follows: describes the measures of dispersion of the average speed of vehicles passing through the road section. +e section speed QD � 􏽮⟨Node , Node ⟩, Distance 􏽯, j i i+1 j of vehicles on the expressway within a certain period of time constitutes the speed set of the section. Sort the value of i∈n (3) speed: the speed at 85th percentile is v , and the speed at 15th 1 v � 􏽘 v , percentile is v . +e speed dispersion index can be expressed QD ,T 2 j as where n represents the number of all vehicles within certain Δv � v − v . (2) 1 2 time period T of the road section QD and v represents the j i average speed of each vehicle on the road section QD within +e larger the value range is, the higher dispersions of certain time period. the speed information are. If QD does not belong to the expressway road network G, it means that the section data of the middle gantries are Definition 6. +e speed limit includes the minimum speed missing. And path searching algorithm based on Node , limit and the maximum speed limit. +e speed limit value is Node needs to be performed to fill the missing gantry i+1 generally an integer multiple of 10. In this paper, we only transaction data. As shown in Figure 2, if the road section discuss the maximum speed limit. formed by Node and Node cannot be queried in the road i i+1 network G, use Node and Node as the basic node. +e i i+1 feasible path Node , Node , Node , Node can be obtained i a b i+1 3. Methods through path search. Node and Node are supplementary a b nodes, and the average speed v between Node and Node is 3.1. ETC Data Preprocessing i i+1 taken as speed for⟨Node , Node ⟩, ⟨Node , Node ⟩, i a a b 3.1.1. ETC Data Cleaning. +e ETC gantry system can ⟨Node , Node ⟩. b i+1 generate a large amount of transaction data in a short period. To ensure the reliability of the average speed v, the Due to system error, information exchange interruption, minimum speed v is set for high-speed driving to 30 km/h min and severe weather conditions, these factors can lead to and the maximum speed v for high-speed driving to max abnormal data which can affect the results. In order to 160 km/h [19]. If the average speed value is not in the range reduce interference, the data needs to be preprocessed, vε[v , v ], where v is the average speed of all road min max mainly including the following aspects. sections between Node and Node , it will be deleted as i i+1 Data Redundancy: Duplication between Multiple Data. abnormal data. +e specific process of the section speed data +e transaction information of each vehicle passing through construction algorithm is shown in Algorithm 1. the ETC gantry should be unique. However, due to problems in data acquisition, transmission, storage process, and other intermediate links, it can cause the repeated data uploading 3.1.3. Outlier Information Detection Algorithm for Road and duplication, resulting in data redundancy. +erefore, Section. To better analyze the road section speed distribu- these data need to be cleaned. tion feature of each section, a noise data cleaning model is Q23 4 Scientific Programming the speed values of the 50th percentile, upper and lower 25th percentile, and the upper and lower 15th percentile of the Node i+1 speed set of the road section, and then converts it into Node Node i multidimensional feature vector α. It can be expressed as follows: Node α � α , α , . . . , α 􏼁 , (6) 1 2 6 where α ∼ α are, respectively, the 15th, 25th, 50th, 75th, Figure 2: Schematic diagram of driving path. 1 6 85th, and 95th percentile of the total section speed distri- bution, which can describe the overall distribution of the constructed to detect and eliminate outliers in the data. +e speed in road section. basic idea of the model is to use the upper and lower limits of the speed boxplot to detect abnormal points and determine the threshold interval for filtering abnormal speed data. 3.2.2. Road Section Speed Evaluation Feature. Road section Under the condition of collecting a large amount of ex- speed feature are described by the relevant evaluation in- dexes in frequency domain, including average speed, speed pressway ETC transaction data, according to the central limit theorem, the road section speed data set should be a standard deviation, and speed dispersion, which can transform into multidimensional feature vectors β. It is normal distribution. And the upper and lower limits of the speed boxplot that meet the 3σ interval range of the normal expressed as follows: distribution can better prove the rationality of realizing (7) β � β , β , β , β 􏼁 , 1 2 3 4 outlier detection and filtering through boxplot analysis. As shown in Figure 3, there are 6 element points in the boxplot, where β is the majority number of section speed, repre- among which q1 is 1/4 divide point; q2 is the median; q3 is senting the general level of vehicle speed statistical law; β the 3/4 divide point; and IQR � q3 − q1, which is the dis- and β are the overall average interval speed of the road tance between q1 and q3. +ere are also upper limit and section μ and standard deviation σ, respectively; and β lower limit. Here, q1 represents the speed value greater than attributes the speed dispersion indices, which reflects the 25% of the traffic flow, q2 represents the speed value greater changing range and dispersion range of speed data. than 50% of the traffic flow, and q3 represents the speed value greater than 75% of the traffic flow. +us, the upper and lower limits of the noise data cleaning threshold model 3.2.3. Road Section Speed Time Domain Feature. Road section speed time domain feature reflects the speed evo- can be obtained, expressed as follows: lution regularity of the traffic flow on different road sections Upper limit: q3 + 1.5 ∗ IQR, under different limited speed conditions. If the section speed (4) Lower limit: q1 − 1.5 ∗ IQR. data was analyzed by day without considering the feature of different periods, it was easily affected by road congestion +en, the threshold range of velocity filtering is obtained and other factors in individual periods, and it cannot reflect as follows: the speed evolution feature of the road. +erefore, it is necessary to fully integrate the speed feature information of v ∈ (Lower limit, Upper limit). (5) roads in different periods. +e whole day is divided into 24 time periods, denoted as 0, 1, ..., 23, respectively. +en, Among which, the speed data of the road section within mining and counting the speed information of each road the range of v is retained, and the outlier data is deleted. section in each period is carried out to find the speed change law of each road section. As shown in Figure 3, the mul- 3.2. Feature Vector Model of Expressway Speed. Vehicles tidimensional velocity time domain feature vector is con- driving on the expressway have different speeds at different structed. It is expressed as follows: times or on different road sections. +rough the statistical analysis of the feature of the traffic speed of the road section, c � c , c , . . . , c , (8) 1 2 n the potential connection between the speed of the vehicle where c ∼ c is the average road section speed of each and the road speed limit information can be obtained, after 1 n period in the data sample; that is, the average road section which the road section speed feature vector model is con- speed of 24 time periods in the whole day, in order from structed. +e feature vector is mainly divided into three large to small, takes the first n values. Here, we take the first 6 categories such that the first is the frequency-speed per- values to avoid the disturbance caused by the relatively low centile feature, the second is road section speed evaluation road section speed caused by traffic congestion or road feature, and the third is road section speed time domain maintenance in some periods. feature. 3.2.1. Road Section Frequency-Speed Percentile Feature. 3.3. Sample Imbalance Processing. +e road speed limit Road section frequency-speed percentile feature reflects the classification values constructed in this paper conform to the distribution of the section speed at different times, including 80 km/h, 100 km/h, 110 km/h, and 120 km/h specified in the Scientific Programming 5 Input: trajectory data of a car D, expressway road network data G Output: speed data of the road section (1) fuction Sections(D)//+e vehicle trajectory data is divided into the data of each section of the vehicle (2) D � {D , D , D }, D � {N , T } 1 2, . . . E i i i (3) for i � 0 to E-1 do (4) Node , Node ←D , D //Extracting the node information of two adjacent data points i i+1 i.N i+1.N (5) Time , Time ←D , D //Extracting the time information of two adjacent data points i i+1 i.T i+1.T (6) delta � Time -Time //Calculating the time difference between two adjacent data points i+1 i (7) R ←(Node , Node )//Reconstitute the front and back node information of the vehicle passing section i.Q i i+1 (8) R ←(Time , Time )//+e front and back time information of the vehicle passing section i.T i i+1 (9) R ←(road , road , delta) i i.Q i.T (10) Sec←{ R , R ,. . ., R }//Encapsulating into Sec 0 1 E-1 (11) end for (12) return Sec (13) end fuction (14) Sec � { R , R , R }←Sections (D) 1 2, . . . m (15) for each R in Sec(j � 0,1,2, . . ., m) do//Extracting road information from the data, which R � (Q , T , delta ) j j j j j (16) if Q in G then (17) Distance ←G . //Getting road section distance from expressway network, which k � Q k k Distance j (18) t � Sec //Extracting the time required for vehicles to pass through the road section j.delta (19) v � Distance /t//Speed of vehicle passing through road section (20) R ←v //Adding speed attribute j.V j (21) if Q not in G then//+e road information cannot be found in the expressway network, and there is uncollected node information between two nodes of the road section (22) {N , N , . . ., N }←shortest_path(G, N )//Searching the shortest path between two nodes, getting the path node data set, which 1 2 Z j Q �(N ,N ) j 1 Z (23) A � {A , A ,. . .,A }←{N ,N ,. . ., N }//Converting path node data set to road section data set 1 2 Z-1 1 2 Z (24) path � { } (25) for A in A then (26) path � {path , path , . . ., path }←G . //Getting road section distance from expressway network, and add to path, which 1 2 Z-1 k Distance k � A (27) end for (28) A. ←Sum(path) Distance (28) V ←A. /R . A Distance j delta (29) if V ≥ V and V ≤ V then A min A max (30) for A in A then (31) t ←(􏽐 path )/V //Calculating time difference i i A (32) t , t ←R . //Extracting the time passing through the two nodes separately 1 2 j T (33) if i � 1 then (34) A ←t //+e time when the vehicle enters the entrance A i.tq 1 i (35) A ←t +t //+e time when the vehicle leaves the entrance A i.th 1 i i (36) A ←t //Time difference i.delta i (37) else (38) A ←t +t //+e time when the vehicle enters the entrance A i.tq 1 i-1 i (39) A ←t +t //+e time when the vehicle leaves the entrance A i.th 1 i i (40) A ←t -t //Time of passing through the road section i.delta i i-1 (41) A ←(A , A ) i.T i.tq i.th (42) A ←V i.V A (43) end for (44) A←{Q,T,delta,V}//Getting the corrected section information, including road section node, time and road section speed attributes (45) R ←A//A replaces the original R , and to generate a new R j j j (46) end if (47) end if (48) end for (49) speed_data←{R , R , . . ., R }//Generating speed data of road section 0 1 c ALGORITHM 1: Algorithm of speed data construction in road section. 6 Scientific Programming IQR Q1 Q3 Lower limit Upper limit Median -4σ -3σ -2σ -1σ 0σ 1σ 2σ 3σ 4σ -2.698σ -0.6745σ 0.6745σ 2.698σ 24.65% 50% 24.65% -4σ -3σ -2σ -1σ 0σ 1σ 2σ 3σ 4σ Figure 3: Schematic diagram of noise data cleaning threshold model based on boxplot analysis. “Road Speed Limit Sign Design Specification” (JTG/T 3381- is iteratively calculated to determine the k-nearest 02-2020) and the “Expressway Engineering Technical neighbor sample points Standard” (JTG B01-2003). Because most of the data we Step 3. Perform random linear interpolation on the collect is 100 km/h, this means the data size of 100 km/h is connection line between sample points and the selected far more than the other three types of sample data, 80 km/h, s neighboring sample points to generate new samples 110 km/h, and 120 km/h. +is creates an imbalance among Step 4. Repeat Step 2 and Step 3 until the various sample categories. +erefore, to tackle the problem of un- categories of the expressway speed feature vector data balanced data samples, there are two processing methods, set reach a balance including oversampling and undersampling [20]. Over- sampling is to copy the minority samples multiple times to expand the data volume of the minority samples. +is 3.4. Maximum Speed Limit Recognition Classification Model. oversampling method will duplicate the preexisting sample +e acquisition of speed limit information on expressways is data, which will lead to a certain degree of overfitting during an important factor that affects the driving safety. Different the model training process. Undersampling is to randomly road sections correspond to different speed limit informa- remove part of the data from the majority samples or select a tion, and the differences of speed limit information directly part of the sample in this category according to a certain affect the state of the vehicles, which makes the relevant data proportion as the sample data. +is method will cause the show a certain pattern. Using strong learning machine to model to only learn a part of the rules of the sample data; perform in-depth learning and training on related data can thus, it cannot effectively reflect the complete pattern of the achieve high-precision recognition results. XGBoost is a sample in this category. In order to alleviate these problems, method of integrated learning based on a boosting algorithm an improved random oversampling method SOMTE [21] is [22]. Its learning machine usually takes the decision tree utilized, which analyzes the minority samples, by using their model and learns the true value and the residuals of the similarity in feature space to add the simulated new samples current prediction values of all trees through the continuous to the data set. +e number of minority samples in the iterative generation of new trees. +en, the results of all trees original data set is expanded, and the dispersion between are accumulated as the final result to obtain a better clas- categories is reduced; therefore, the imbalance problem is sification accuracy [23–25]. By using the XGBoost algorithm solved. +e process of the SOMTE can be divided into the as a classifier for identifying the maximum speed limit in- following steps: formation on expressways, the maximum speed limit in- formation can be determined accurately. Step 1. Select the speed feature vector set of minority sample categories with speed limit values of 80, 110, A sample data set is constructed by extracting 16-di- and 120 km/h mensional speed feature vectors from the expressway section Step 2. For each category of sample set, Euclidean data with the known speed limit information. Suppose the data set is S � (x distance is used as the metric in the feature space, and 􏼈 , y ), (x , y ), . . . , (x , y )􏼉. 1 1 2 2 m m then the distance between each sample in the sample set x (i � 1, 2, . . . , M) is the feature vector of the ith sample, i Scientific Programming 7 � � also known as the input value, that is, the constructed 16- 1 � � � � (11) Ω f 􏼁 � cT + λ�w �, k k k dimensional expressway speed feature vector. y (i � 1, 2, . . . , M) is the output value of the ith sample, that where c represents the penalty coefficient of the model, and is, the road speed limit classification labeled value corre- the value range is [0,1]. T represents the number of leaves of sponding to x . Assuming that the XGBoost integrated the kth tree; c is the regular term coefficient. learning model integrates a total of K regression trees, the +e XGBoost algorithm adopts an additive step-by-step prediction result of the XGBoost algorithm can be expressed integration strategy in the training process. First, optimize as in the following equation: the first tree, and then optimize the second tree until the kth tree is optimized, and the loss function is continuously y � 􏽘 f x􏼁 , f ∈ F, (9) i k i k reduced during the optimization process. By adding an k�1 incremental function f in the iterative process to optimize where K is the number of trees, f corresponds to the kth the objective function, the prediction accuracy can be im- proved, and the calculation method can be expressed as in regression tree with structure q and leaf weight w , F is an k k integrated classifier composed of all regression trees, the following equation: andf (x ) corresponds to the predicted score of the kth k i (t) (t− 1) regression tree on the sample x . Obj � 􏽘 l y , y 􏽢 + f x + Ω f + c, (12) 􏼐 􏼁 􏼑 􏼁 i i t i t i�1 +e objective function of XGBoost consists of a loss function and a regular term, expressed as follows: (t− 1) where c is a constant term and y 􏽢 represents the pre- dicted value in the (t − 1)th iteration on the ith sample. +en, Obj � 􏽘 l y , y 􏽢􏼁 + 􏽘 Ω f 􏼁 , (10) i i k carry out the expansion of the second-order Taylor equation i�1 k�1 and discard the constant term in order to reduce the running time of the model, expressed as follows: where l is the error function and Ω(f ) is the regularization term. +e regular term can be expressed as follows: (t) (t− 1) 2 Obj � 􏽘􏼔l􏼐y , y 􏼑 + g f x􏼁 + h f x􏼁 􏼕 + Ω f􏼁 i i i t i i t i t i�1 (13) ⎢ 2 ⎥ ⎡ ⎢ ⎤ ⎥ ⎢⎛ ⎜ ⎞ ⎟ ⎛ ⎜ ⎞ ⎟⎥ ⎢⎝ ⎠ ⎝ ⎠⎥ ⎣ ⎦ � 􏽘 􏽘 g w + 􏽘 h + λw + cT, i j i j j�1 i∈I i∈I j j where I � i|q(x ) � j represents the sample set of leaf j cross-validation in each iteration to get the ideal 􏼈 􏼉 j i and g and h are the first derivative and the second de- number of decision trees i i rivative of the loss function, respectively. Step 2. According to Step 1, the learning rate and the +e objective function is converted into a quadratic number of decision trees are determined, and the K − (t) function Obj about w to find the minimum value, and then j fold cross-validation method and grid search method the optimal prediction score of each leaf node and the optimal are used to optimize the parameters of each boosting value of the objective function are obtained as follows: machine j Step 3. +e method is the same as Step 2; based on the w � − , given data, adjust the regularization parameters to H + λ reduce overfitting (14) G Step 4. Appropriately reduce the learning rate to de- (t) Obj � − 􏽘 + cT, 􏼐 􏼑 termine the final ideal parameter combination of the 2 H + λ j�1 model where G � g , H � h . 􏽐 􏽐 j i∈I i j i∈I i j j After that, the optimization of XGBoost parameters 3.5. Maximum Speed Limit Recognition Model. +e problem mainly include the following 4 steps: of identifying the maximum speed limit information on Step 1. Choose a higher learning rate, set a reasonable expressways is a classification problem. +e framework of initial value of the booster parameters, and use K-fold identification model is shown in Figure 4. Dynamic 8 Scientific Programming Process 1 Import highway ETC data (including ETC transaction data, road network Noise filtering in ETC data, Start topology data, road speed limit eliminating abnormal data. Start training based on information data, etc.) XGBoost Preset the number of iterations and define the loss function Analyze the speed feature of the road Eliminate outlier Use the processed ETC section and construct the expressway velocity samples based data to construct a road road section speed feature vector on the boxplot section speed data set Calculate the first and second model derivatives Add a tree and update the loss function Expressway speed limit Oversampling based on stratified samples divided information recognition SMOTE to balanced data into training samples and model XGBoost set test samples algorithm (in process 1) Meet the iteration requirements? Expressway Recognition model parameter optimization speed limit End based on grid search and K_fold cross- information validation method (in process 2) End recognition Process 2 Initialize Determine the Cross-validation Start Optimal XGBoost parameter search with accuracy as optimizing combination End algorithm range and perform a the evaluation parameters of parameters parameters parameter grid search index Figure 4: +e flowchart of expressway speed limit information recognition model. identification of highway speed limit information is realized supervision and record, vehicle path identification, toll data based on the following steps. First, the data cleaning is fitting, and other functions [14]. +e experimental data adopted on ETC gantries transaction data, removing dupli- mainly includes three categories. One is the ETC transaction cated data and error data. Taking vehicle speed recognition, data collected by the ETC gantry on various sections of the the algorithm is used to find the missing records in the ETC expressway in Fujian Province for 9 days from September 3 gantries transaction data and to accurately reduce of gantry to September 11, 2020; it contains 50 expressways including distribution on expressways. +e speed of the road section can Fuyin Expressway, Xiazhang Expressway, and Longchang be obtained by calculating the speed of the vehicle between the Expressway, which contains 534 sections, about 100 million gantries. However, there are some very large or small outliers pieces of data. +e average distance between each section is in the speed of the road section so that boxplot is utilized to 8.9 km, 85% of the section distance are less than 16 km, and remove speed outliers. Next, the speed of each driving section the maximum distance is 30 km; its distribution is shown in is analyzed, and the models of frequency-speed percentile Figure 5. +ese data are sourced from Fujian Provincial feature, interval speed evaluation feature, and interval speed Expressway Information Technology Co., Ltd. +e main time domain feature are constructed. Since the velocity dis- attributes of the data are shown in Table 1. +e second tributions of various types in the data are quite different, the category is the road speed limit information data, including oversampling algorithm is used to expand the minority the name of the road section and the maximum speed limit samples to obtain the balanced data. Finally, data are divided value of the road section, which is derived from the online into training data and test data. +e training data are inputted announcement of the Fujian traffic police. It is used for into XGBoost algorithm for training and learning; the model learning, training, and testing; the third category is training process is shown in process 1 in Figure 4. At the same the distance of each section of the expressway from the time, the grid search and cross-validation are used to find the Amap, including the node pair of the gantry of each section optimal parameters of each boosting machine in XGBoost; the and the actual road section distance. optimization process is shown in process 2 in Figure 4. 4.2. Experimental Results and Analysis 4. Experiments and Results 4.2.1. ETC Data Preprocessing. Matching the initially cleaned 4.1. Introduction of Experimental Data. ETC gantry system is ETC data with the road network topology data, the road section one of the main components of the Expressway ETC System, speed of each vehicle is calculated, and then the expressway which is used for real-time vehicle driving information road section speed data set is constructed. Table 2 shows the Scientific Programming 9 Map of Fujian Province of China 1:4,930,986 Figure 5: Distribution of expressway gantries in Fujian province. Table 1: ETC shelf system transaction data attribute table. Attribute name Examples Attribute name Examples ∗∗∗ Trade ID 340 98 OBU plate Blue Fujian A12345 Trade time 2020/9/5 21 : 29 : 26 Vehicle class 1 ∗∗ Flag ID 35 15 Enter time 2020/9/5 21 : 29 : 26 Flag type 0 Enter station 25 7 ∗∗∗ Flag index 1 OBU ID 12B E7 main characteristics of the data. Due to the influences of some accounting for 9.68%, and the preprocessed section speed data random factors, there may be a certain amount of outlier data; is approximately 11.1 million. these outlier values of each road section can be detected through the noise data filtering model. After the noise data is 4.2.2. Road Section Velocity Feature Vector. After obtaining eliminated, the road section velocity data after preprocessing is obtained. As shown in Figure 6, the road section speed data of the preprocessed speed data set of the road section, the road section speed feature vector model is constructed based on the the road section from September 3, 2020, to September 11, 2020, is used. Among them, the abscissa denotes the date of statistical analysis of the expressway road section speed feature by day. +us, the expressway road section data set contains 3 each day, and the ordinate represents the magnitude of the road section speed. In addition, each box represents the overall types, including 16-dimensional feature vector, and its sample classification mark value is obtained. +e attributes shown in distribution of the road section speed of the road section on that day, and the black origin represents the part need to be Tables 3–5 are the feature vectors, and output of the model after the speed data feature is extracted. Among them, Q D is a road deleted. +e original speed data of the road section are around section; for example, QD 1.229 million, the abnormal data are about 1.19 million, represents the road 340507−351C03 10 Scientific Programming Table 2: Expressway road section speed data attribute table. Attribute name Examples Attribute name Examples OBUPLATE Blue Fujian A12345 Time delta 439.0 s Before trade time 2020-09-07 11 : 07 : 35 Speed 87.15 km/h ∗∗ Before flag ID 34 05 Enter time 2020-09-06 21 : 24 : 20 ∗∗ After trade time 2020-09-07 11 :14 : 54 Enter station 330 11 ∗∗ After flag ID 34 07 Road distance 10628 m date Figure 6: Velocity information distribution boxplot. Table 3: Frequency-speed percentile feature (unit: km/h). Q D Date α α α α α α l 1 2 3 4 5 6 QD 2020-9-3 70 73 79 92 103 109 110 340507−351C03 QD 2020-9-4 70 73 79 95 102 110 110 340507−351C03 QD 2020-9-8 51 59 80 92 95 103 100 34012B−34012 D QD 2020-9-9 51 58 82 94 94 104 100 34012B−34012 D QD 2020-9-7 67 71 78 83 87 92 80 350703−350701 QD 2020-9-8 70 75 82 89 92 96 80 350703−350701 QD 2020-9-3 88 95 106 114 117 123 120 341801−341801 QD 2020-9-4 91 97 107 114 117 123 120 341801−341801 Table 4: Road section speed evaluation feature (unit:km/h). Q D Date β β β β l 1 2 3 4 QD 2020-9-3 76 83 13 32 110 340507−351C03 QD 2020-9-4 76 83 14 32 110 340507−351C03 QD 2020-9-8 95 75 19 44 100 34012B−34012 D QD 2020-9-9 95 76 20 46 100 34012B−34012 D QD 2020-9-7 82 77 9 20 80 350703−350701 QD 2020-9-8 82 81 10 22 80 350703−350701 QD 2020-9-3 114 103 13 29 120 341801−341801 QD 2020-9-4 111 104 12 26 120 341801−341801 speed (km/h) 2020-09-03 2020-09-04 2020-09-05 2020-09-06 2020-09-07 2020-09-08 2020-09-09 2020-09-10 2020-09-11 Scientific Programming 11 Table 5: Road section speed time domain feature (unit:km/h). Q D Date c c c c c c l 1 2 3 4 5 6 QD 2020-9-3 88 87 87 87 8 6 86 110 340507−351C03 QD 2020-9-4 91 89 89 89 88 86 110 340507−351C03 QD 2020-9-8 81 80 80 8 80 80 100 34012B−34012 D QD 2020-9-9 85 83 82 82 81 81 100 34012B−34012 D QD 2020-9-7 82 81 79 79 79 78 80 350703−350701 QD 2020-9-8 84 84 83 83 82 82 80 350703−350701 QD 2020-9-3 106 106 105 105 105 105 120 341801−341801 QD 2020-9-4 107 106 105 105 105 105 120 341801−341801 Table 6: Optimal combination of model parameters. Parameter Search scope Step length Optimal value n_estimators [100,1000] 100 700 learn_rate [0,0.5] 0.01 0.07 max_depth [1,15] 1 8 min_child_weight [1,9] 1 1 section between ETC gantry 340507 to ETC gantry 351C03. +e model can be established through the above pro- Date represents the date when the traffic condition occurred, cessing, using test data to verify the effectiveness of the model, and α − α represent that each section is between 15% and and the results of the confusion matrix are shown in Table 7. 1 6 95% of driving speed, where β − β represents the mode, In 3295 test samples, 3212 were identified correctly, with an 1 4 average, standard deviation, and dispersion of vehicle speed, accuracy rate of 97.5%. +e recognition accuracy of 80 km/h data is 100%. +is is because the data with a speed limit of c − c represent the first 6 values after sorting the average road 1 6 speed in 24 time periods of the day, andl represents the 80 km/h is quite different from other categories and can be better distinguished. However, the gap between the category maximum speed limit value. data with100 km/h and110 km/h is very small, and it is easy to cause mistakes in identification. Among them, there are 824 4.2.3. Balance Analysis of Sample Data. +ere are 5,081 sample data with a speed limit of 100 km/h, 759 correctly samples in road section speed feature vector data set, identified, and 47 with a speed limit of 110 km/h, which makes among which the number of samples with 80 km/h, the accuracy rate decrease to some extent. For the same 100 km/h, 110 km/h, and 120 km/h speed limits accounts reason, the accuracy rate of the 110 km/h limit is also lower for 5.31%, 87.24%, 9.39%, and 2.83%, respectively, which position compared with the other three categories. are seriously unbalanced among different categories and have adverse effects on the efficiency of model identifica- tion. +erefore, the SMOTE is used to oversample the 4.2.5. Comparison and Analysis sample data with speed limits of 80, 100, and 120 km/h, which makes it possible to achieve relative balance among (1) Impact Analysis of Data Equalization. In order to verify all kinds of samples. In the experiment, the new data the influence of oversampling model on SMOTE algorithm, obtained by the SMOTE algorithm is used as the input of the original data set and the data set processed by SMOTE the algorithm model. +e sample data consists of training algorithm are used for training and learning. +e other steps sample data and testing sample data. of the model are consistent, and two model classifiers are obtained. +e comparison of classification results is shown in Table 8. +e first category is the model result corre- 4.2.4. Ae Result of the Model’s Performance. +e parameter sponding to the data set processed by the SMOTE algorithm, setting of XGBoost algorithm is an important factor that affects and the second category is the model result corresponding to the performance of the model. In order to improve the accuracy the original data set. +e following can be seen from Table 8: of the model, a set of sensitivity experiments is conducted to optimize the performance of the model. First, four boosting (1) After the SMOTE algorithm oversampled the data, machine parameters are identified that have a significant impact the accuracy, recall rate, and F1-score of all cate- on the model, including n_estimators, learn_rate, max_depth, gories were greatly improved. and min_child_weight. Second, a combination of grid search and K-fold cross-validation (GK) are used to obtain the optimal (2) +e data with the speed limit value of 100 km/has the most samples. Without data expansion in the parameters, in which K � 5 for cross-validation. Follow the method of Section 3.4 for parameter optimization. +e search oversampling process, the evaluation indexes of this class are still improved, indicating that the SMOTE range, step length, and postexperiment parameter optimizations for each parameter are shown in Table 6. algorithm can not only greatly improve the 12 Scientific Programming Table 7: Confusion matrix. Real class Speed-limiting class (km/h) 80 100 110 120 Accuracy rate (%) 80 824 0 0 0 100 100 6 759 47 12 92.1 110 0 12 807 5 97.9 120 0 1 0 822 99.9 Accuracy rate 99.3% 98.3% 94.5% 98.0% 97.5 Forecast result Table 8: Effect comparison before and after data oversampling. Category Speed limit category (km/h) Precision Recall F1-score After oversampling 80 1.00 1.00 1.00 Before oversampling 80 1.00 0.51 0.67 After oversampling 100 0.98 0.92 0.95 Before oversampling 100 0.91 0.98 0.94 After oversampling 110 0.94 0.98 0.96 Before oversampling 110 0.73 0.46 0.57 After oversampling 120 0.98 1.00 0.99 Before oversampling 120 0.80 0.32 0.46 After oversampling Avg/total 0.98 0.97 0.97 Before oversampling Avg/total 0.89 0.90 0.89 recognition accuracy of minority speed limit infor- velocity time domain feature. All the features are taken into mation, but also effectively improve the recognition account, and the experimental results are compared. +e accuracy of majority speed limit information. experimental results are shown in Figure 7, where A1–A7 represent models A , A , A , A , A , A , and A , α β c α,β α,c β,c α,β,c (3) +e SMOTE algorithm improves the prediction accu- respectively. +e following can be seen: racy of data with a speed limit of 110 km/h and 120 km/ (1) When only a single feature is added, a better model h, and the recall rate and F1-score are also greatly prediction effect can be obtained by adding fre- improved. It has little effect on the prediction accuracy quency-velocity percentile feature, followed by in- of class data with a speed limit of 80 km/h but has a terval velocity evaluation feature model and interval great influence on the recall rate and F1-score. velocity time domain feature model. (2) Comparison and Analysis of Feature Vector Model. By only (2) When two features are added, the prediction effect is adjusting input features, the other steps remain the same; the improved compared to a single feature. When all the effectiveness of different types of features in expressway features are added, the prediction effect is the best. section speed feature vector model can be verified. Seven sets (3) +e contribution of each feature in the speed feature of experiments are set up to verify the influence of a single- vector model of the expressway section to the pre- feature and multiple-feature combinations on the model. diction model is arranged from large to small, which is Model A indicates that only frequency-velocity percentile the road section speed-frequency percentile feature, feature is considered. Model A only considers the road road section speed time domain feature, and road section velocity evaluation feature. Model A only considers section speed evaluation feature; the contribution of time domain feature of road section velocity. Model A α,β the feature vector in each feature is shown in Figure 8. indicates that frequency-velocity percentile feature and road section velocity evaluation feature are considered. Model A α,c takes into account the frequency-velocity percentile feature (3) Comparison of Classification Models. To further illus- and road section velocity time domain feature. Model A trate the advantages of the model, we compare the per- β,c takes into account the road section velocity evaluation feature formance of GBDT, KNN, SVM, AdaBoost, and Logistic and road section velocity time domain feature. Model A Regression (LR) with our method. +e experimental results α,β,c takes into account the frequency-velocity percentile feature, are shown in Table 9. From the comparison of six different road section velocity evaluation feature, and road section classification methods in Table 7, SVM, AdaBoost, and LR Scientific Programming 13 0.975 0.964 0.965 0.961 0.934 0.936 0.842 0.8 0.6 0.4 0.2 A1 A2 A3 A4 A5 A6 A7 Model groups Figure 7: Model accuracy comparison. γ6 0.0329 γ5 0.0310 γ4 0.0332 γ3 0.0401 γ2 0.0350 γ1 0.0723 β4 0.0404 β3 0.0497 β2 0.0643 β1 0.0818 α6 0.1617 α5 0.0864 α4 0.0553 α3 0.0371 α2 0.1053 α1 0.0735 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Contribution rate Figure 8: Feature contribution. Table 9: Model comparison results. Model Testing samples Prediction of correct samples Accuracy (%) Precision Recall rate F1-score GC-XGBoost 3295 3212 97.5 0.98 0.97 0.97 GBDT 3295 2908 88.3 0.88 0.88 0.88 KNN 3295 3079 93.4 0.94 0.93 0.93 SVM 3295 2374 72.0 0.70 0.72 0.70 AdaBoost 3295 1911 58.0 0.61 0.58 0.51 LR 3295 1684 51.1 0.48 0.51 0.49 proposed. +e speed data of the road section is constructed, classifiers perform poorly in terms of the accuracy, recall rate, and F1-score. GC-XGBoost, GBDT, and KNN can get and the outlier samples in each road section are eliminated by an ideal result on the expressway maximum speed limit the boxplot analysis to ensure the accuracy of the ETC data information recognition, and the recognition accuracy is expression. +en, the SMOTE algorithm is used to over- high. In particular, GC-XGBoost outperforms GBDT and sample the samples of the minority speed limit categories to KNN in terms of the quality of results, with the highest achieve the balance between the various types of road section accuracy rate of 97.5%. speed limit information. Finally, the oversampled training samples are input into the proposed GC-XGBoost (grid search + cross-validation + XGBoost) algorithm for training 5. Conclusion and learning; then it is compared and analyzed with multiple similar algorithms. +e experimental results show the +is paper proposes a method of identifying expressway following: speed limit information based on ETC data mining analysis. First, the abnormal data of ETC gantry is processed, and a (1) +e contribution of each feature in the speed road section speed data set construction algorithm is feature vector model of expressway section to the The feature category Interval speed Frequency-speed evaluation Interval speed percentile characteri time domain characteristic stic characteristic Accuracy rate 14 Scientific Programming prediction model is arranged from large to small, Hundred, +ousand and Ten +ousand Talent of Fujian followed by the speed-frequency percentage fea- (GY-Z19113). ture, time domain feature, and speed evaluation feature. +ree categories of features have an im- References provement effect on the prediction model, and the frequency-speed percentile feature has the best [1] “Ministry of Transport of the People’s Republic of China. improvement effect. Today, announce the change process of our country’s ex- pressways[EB/OL].(2021-03-22),” https://mp.weixin.qq.com/ (2) In the test sample data, the speed limits of 80 km/h, s?__biz�MzI3MDQwMDQ5NQ��&mid�2247537632&idx� 1&sn�8c806399c88108c7bae2c3dd00f56e30&scene�0. 100 km/h, 110 km/h, and 120 km/h classification [2] Z. Yao, H. Jiang, Y. Cheng, Y. Jiang, and B. Ran, “Integrated data recognition accuracy are 100%, 92.1%, 97.9%, schedule and trajectory optimization for connected auto- and 99.9%; the overall accuracy is 97.5%. +e gap mated vehicles in a conflict zone,” IEEE Transactions on between the category data with 100 km/h and Intelligent Transportation Systems, pp. 1–11, 2020. 110 km/h is very small, so the recognition accuracy is [3] M. H. Hosseinlou, S. A. Kheyrabadi, and A. Zolfaghari, relatively low. “Determining optimal speed limits in traffic networks,” IATSS Research, vol. 39, no. 1, pp. 36–41, 2015. (3) +e speed limit recognition accuracy of GC- [4] L. Aarts and I. Van Schagen, “Driving speed and the risk of XGBoost is 97.5%, precision is 0.98, recall is 0.97, and road crashes: a review,” Accident Analysis & Prevention, F1-score is 0.97. +e experimental results are sig- vol. 38, no. 2, pp. 215–224, 2006. nificantly better than those of the other five algo- [5] G. Sugiyanto and S. Malkhamah, “Determining the maximum rithms, which can accurately identify the maximum speed limit in urban road to increase traffic safety,” Jurnal speed limit information of expressway. Teknologi, vol. 80, no. 5, 2018. [6] B. Khondaker and L. Kattan, “Variable speed limit: an +is paper considers the speed feature of hybrid vehicles, overview,” Transportation Letters, vol. 7, no. 5, pp. 264–278, which is suitable for the identification of the maximum speed limit information of expressway. However, this work [7] A. Van Benthem, “What is the optimal speed limit on free- still has some limitations: ways?” Journal of Public Economics, vol. 124, pp. 44–62, 2015. [8] Y. Zhang and P. A. Ioannou, “Combined variable speed limit (1) +e speed limit recognition of 100 km/h and and lane change control for highway traffic,” IEEE Transac- 110 km/h is less effective. More speed limit features tions on Intelligent Transportation Systems, vol. 18, no. 7, can be considered to explore the differences be- pp. 1812–1823, 2016. tween the two to improve their speed limit recog- [9] J. Cao, C. Song, and S. Peng, “Improved traffic sign detection nition effect. and recognition algorithm for intelligent vehicles,” Sensors, vol. 19, no. 18, p. 4021, 2019. (2) In this study, we do not consider the speed limit [10] S. K. Berkaya, H. Gunduz, O. Ozsen, and G Serkan, “On values of different lanes on the same road. In the circular traffic sign detection and recognition,” Expert Systems future, they can be considered to analyze the speed with Applications, vol. 48, pp. 67–75, 2016. limit information on different lanes of the same [11] D. Tabernik and D. Skoˇcaj, “Deep learning for large-scale road through vehicle classification and road lane traffic-sign detection and recognition,” IEEE Transactions on number and construct a more complete express- Intelligent Transportation Systems, vol. 21, no. 4, pp. 1427– way speed limit information recognition model. 1440, 2019. [12] M. Liang, X. Cui, and Q. Song, “Traffic sign recognition Data Availability method based on HOG-Gabor feature fusion and Softmax classifier,” Journal of Traffic and Transportation Engineering, +e data used to support the findings of this study are vol. 17, no. 03, pp. 151–158, 2017. [13] C. Jiang and X. Xue, “A uniform compact genetic algorithm currently under embargo while the research findings are for matching bibliographic ontologies,” Applied Intelligence, commercialized. Requests for data, 12 months after publi- vol. 51, pp. 7517–7532, 2021. cation of this article, will be considered by the corresponding [14] F. Zaklouta and B. Stanciulescu, “Real-time traffic sign rec- author. ognition in three stages,” Robotics and Autonomous Systems, vol. 62, no. 1, pp. 16–24, 2014. Conflicts of Interest [15] S. Aziz and F. Youssef, “Traffic sign recognition based on multi-feature fusion and ELM classifier,” Procedia Computer +e authors declare that they have no conflicts of interest. Science, vol. 127, pp. 146–153, 2018. [16] H. Luo, Y. Yang, B. Tong, W. Fuchao, and F. Bin, “Traffic sign Acknowledgments recognition using a multi-task convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, +is work was funded by the National Natural Science vol. 19, no. 4, pp. 1100–1111, 2017. Foundation of China (41971340), the Special Funds for the [17] A. Pascale, F. Deflorio, M. Nicoli, B. Dalla Chiara, and Central Government to Guide Local Scientific and Tech- M. Pedroli, “Motorway speed pattern identification from nological Development (2020L3014), the 2020 Fujian floating vehicle data for freight applications,” Transportation Province “the Belt and Road” Technology Innovation Research Part C: Emerging Technologies, vol. 51, pp. 104–119, Platform (2020D002), and the Provincial Candidates for the 2015. Scientific Programming 15 [18] L. Liao, X. Jiang, M. Lin, and F. M Zou, “Recognition method of road speed limit information based on data mining of traffic trajectory,” Journal of Traffic and Transportation Engineering, vol. 15, no. 5, pp. 118–126, 2015. [19] J. Yang, J. Xu, C. Gao, B. Guohua, X. Linfang, and L. Menghui, “Modeling of the relationship between speed limit and characteristic speed of expressway traffic flow,” Sustainability, vol. 11, no. 17, p. 4621, 2019. [20] R. C. Prati, G. E. Batista, and M. C. Monard, “A study with class imbalance and random sampling for a decision tree learning system,” in Proceedings of the IFIP International Conference on Artificial Intelligence in Aeory and Practice, pp. 131–140, Springer, Boston, MA, July-2008. [21] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sam- pling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. [22] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, University Press, New York, NY, USA, August [23] T. Chen, T. He, M. Benesty et al., “Xgboost: extreme gradient boosting,” XGBoost contributors [cph] (base XGBoost imple- mentation, vol. 1, no. 4, 2015. [24] A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. Mohammadian, “Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis,” Accident Analysis & Prevention, vol. 136, Article ID 105405, 2020. [25] X. Shi, Y. D. Wong, M. Z. F. Li, C. Palanisamy, and C. Chai, “A feature learning approach based on XGBoost for driving assessment and risk prediction,” Accident Analysis & Pre- vention, vol. 129, pp. 170–179, 2019.

Journal

Scientific ProgrammingHindawi Publishing Corporation

Published: Nov 18, 2021

There are no references for this article.