Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Intergeneration Division Based on Key Component Analysis in an Autonomous Transportation System Using the Natural Language Processing Method

Intergeneration Division Based on Key Component Analysis in an Autonomous Transportation System... Hindawi Journal of Advanced Transportation Volume 2023, Article ID 5850876, 11 pages https://doi.org/10.1155/2023/5850876 Research Article Intergeneration Division Based on Key Component Analysis in an Autonomous Transportation System Using the Natural Language Processing Method 1,2 1,2 1,2 Yuezhao Yu, Chao Gou , and Chen Xiong School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China Guangdong Provincial Key Laboratory of Intelligent Transportation System, School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 510275, China Correspondence should be addressed to Chen Xiong; xiongch8@mail.sysu.edu.cn Received 14 October 2022; Revised 10 February 2023; Accepted 23 February 2023; Published 16 March 2023 Academic Editor: Yajie Zou Copyright © 2023 Yuezhao Yu et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Advancement of emerging technologies and increasing of transport demands accelerate the evolution of the autonomous transportation system (ATS). Framework and architecture of ATS are becoming a research hotspot; however, by far, few studies on transportation intergeneration division are not basically involved. Previous works indicate that key components are critical representation in the distinguishing of long-term era. Besides, massive text material accumulates as the research work goes on, and natural language processing technique keeps developing, which makes quantitative research on key components in in- tergeneration division become possible. In this work, a method based on the massive text analysis is proposed. First, the LDA2vec is used to get the relationship between components and other elements. Ten, a word set is from the component word set extraction module based on component items. Finally, the component word set is clustered to get ATS generation and to generate key components. Based on an analysis of large-scale important trafc texts, our method divides the trafc system into three generations for Chinese trafc from 2010 to 2022. Te key components of our method given are consistent with human cognition of ATS. Successful application indicates that this work can be extended to other intergeneration division felds. big enough to revolute ATS acknowledge, ATS will tran- 1. Introduction sition a new generation. Intergeneration division is proposed Autonomous transportation system (ATS) is a complex to divide the framework features into intergenerational trafc system that focuses on future development direction groups. It clusters the features along the timeline to ag- in the intelligent trafc system (ITS) framework. According gregate diferent stages. Tis work provides a clearer un- to the ATS theory, the trafc system consists of fve basic derstanding about framework acknowledge and promotes elements, namely, technology, need, service, function, and research on framework. Furthermore, the transportation system has been rapidly afected by high technology in component [1, 2]. As a stable framework, ATS unifes the interior and exterior of transportation system. Te external recent years. Artifcial intelligence and connected vehicles efect is driven by technologies and needs, the internal efect are continuously integrated into the transportation feld, is activated by services and functions, and components carry making modern transportation more intelligent compared the whole ATS as physical entities [3, 4], the ATS framework to traditional transportation [5, 6]. Te way industries has periods in the development process. Due to the external produce and the way people travel are changing at an driving efect, the internal activation efect and the carrying accelerated rate. If we are aware of the changes in each efect are composition and performance of the fve elements period, we can learn about how the ATS elements evolute. that will change between diferent periods. When the efect is Tis helps set development goals for the system framework 2 Journal of Advanced Transportation (2) Physical entities are used to perform ATS in- and adjust trafc industrial structure and helps generate huge economic benefts. tergeneration division from a more specifc view, and a method for determining key components of in- Since the 1990s, in theory and technology research on ITS, scholars have not clearly defned about ITS in- tergeneration division is proposed tergeneration division. Instead, they use continuous design to advance the ITS theory and technology. In the research of 2. Related Work diferent scholars, the intergeneration division is considered as a work based on experience. Tis work is relatively 2.1. Autonomous Transportation System (ATS). Te devel- subjective and vague and has advantages when the trans- opment of the ATS framework is based on ITS. In 1992, the portation system is simple, and there has not been much ITS system framework was frst established in the advancement in technology. With the explosive emergence United States by ARC-IT [8], which defned user needs and of modern transportation technologies, the transportation services, logical architecture, physical architecture, and system is more difcult to level in a subjective way. A policies as its work content. Since then, the ITS system method of intergenerational division that divides the framework has been proposed by China [9], the European transportation system’s development process is becoming Union [10], and Japan [11] in accordance with their re- more and more necessary. Te existing intergeneration spective nations. However, the ITS framework requires more division methods about system are difcult to systematically details to improve in present. Big data, autonomous driving, divide the ATS. Te specifc manifestations are as follows: vehicle-road cooperation, 5G, and other technologies are used in transportation. Te role of human beings as decision (1) Tere are diferent element relationships between makers in transportation is being replaced by artifcial in- ATS and other systems. Other intergeneration di- telligence. Te ATS framework for reducing human in- vision methods cannot adapt to ATS. tervention is proposed. In order to liberate human beings (2) No established standard for the intergeneration di- from driving, researchers deconstruct ATS. Tey propose vision method and ways to intergeneration division a method to self-organizingly evolve needs, services, func- are ambiguous and inconsistent. tions, technologies, and components, which is the ATS el- (3) Te key of ingeneration division is not specifc. Te ement framework. Adaptive adjustment is used in this key of division is not based on physical entities. Te framework to ofer theoretical support. ATS is still in the generation features cannot be clearly described. exploratory stage; therefore, most of the ATS research is theoretical. Wei et al. [12] study the ATS needs and propose In order to solve the above problems, [2] we propose an a user need system construction method. Zhang et al. [13] ATS intergeneration division standard based on key com- use Petri nets to build a relationship network between el- ponents, which uses physical entities as a bridge to refect ements and model the evolution of ATS. Te relationship changes between generations. All elements depend on between its ATS elements is shown in Figure 1, which is an components for expression. Tis concept is used in this important theoretical basis for ATS architecture. Deng et al. work. In this work, an ATS intergeneration division method [1] build an architecture mapping relation using text is proposed based on big data in our work. Based on the analysis. Xu et al. [14] develop an analysis of ATS logical LDA2vec topic model [7] in natural language processing architecture and prove the reliability of ATS logical archi- (NLP), topics are extracted from the divided text set as tecture. By ATS physical architecture, Deng et al. [15] defne condensed text content. Do text similarity comparisons with the physical architecture of ATS and form the theoretical topics and components to quantify the relationship between basis. Zhou et al. [16] build the physical architecture of ATS other elements and components. Te word distribution and and create the groundwork for application. the year distribution from the text set and the values form the relationships that commonly constitute the component word set and then cluster to get the desired generation. 2.2. Topic Model. In 2003, Blei et al. [17] propose the Latent Finally, the key components are selected according to the Dirichlet Allocation (LDA) topic model, which provided the clustering results. In this process, components are used as topic from text set with a probability distribution. Tis physical entities to connect the entire system framework, technique is able to summarize multiple latent topics from and information to establish intergeneration division also text collection into a small number of words. After that, depends on components. Besides, the selected components numerous LDA-based natural language processing (NLP) include feature diferences between two generations. studies were conducted. Weng et al. [18] use LDA to im- Terefore, the selected components are representative. Te plement opinion monitoring analysis on Twitter. Bian et al. main contributions are as follows: [19] use LDA for multidocument summarization. Since LDA (1) A machine learning method based on the LDA2vec is a topic model based on bag-of-words, it can efectively model is proposed for ATS intergeneration di- summarize contextual macro information. However, for vision from an objective view with a group of microsentential information, LDA does not work well. In massive texts order to address this issue, Moody et al. [7] combine word Journal of Advanced Transportation 3 service 3. Proposed Method layer In Figure 2, the process of our method is shown. Our needs intention is to divide the transportation system devel- component opment process into several time periods, with genera- layer technologies tion feature. Te goal of the model is to extract relationships between other elements and components function from the massive text. Te collected data are organized by layer subelement items. Te component word set extraction Figure 1: Te fve elements of ATS work as shown in the fgure. module with LDA2vec is proposed to calculate the Needs and technologies as external drivers, functions and services component word set, whose word set consists of word as internal infuences, and components as physical carriers, the distribution and year distribution. After using KMeans system is organized as a model. [31] to cluster the word set, fnally, the generations and their features are generated. vectors and LDA to create LDA2vec, which can integrate context information and sentence information to get topics. Using LDA2vec, Luo and Shi [20] collect information in 3.1. Data Collection. Te authors in [2] demonstrate that aviation safety reports and successfully complete the mining component is the key element of ATS intergeneration di- latent topics task. In [21–23], the topic model is used in vision because the remaining four elements (technology, diferent languages and gets the same satisfactory results. need, service, and function) are based on component as a carrier. Te objective in this subsection is to collect the texts for these four aspects and determine how they relate to 2.3. Intergeneration Division. Intergeneration division is each other and to components. In order to obtain a complete a sociological concept. Many works defne social group dif- intergeneration division, the granularity of data collection ferences as generation and perform division analysis from it. needs to be accurate to the year and calculate the re- Research is broadly divided into two types, the frst is the lationships between the component and other elements for division of predetermined years to examine intergeneration each year [2]. Additionally, it breaks down the ATS gen- variations within the same time period, for example, treating eration evaluation methodology into six indicators. Com- every decade as a generation to study changes in human bined with cognition, the ATS generation texts are divided behaviour [24, 25]. Te alternative method does not rely on into three types: papers, reports, and policies. Te keywords time. Instead, it categorizes individuals who satisfy certain of each item in four elements are manually fltered out. Te criteria and then divides generations based on the temporal texts that make up our training dataset are retrieved into aggregation that corresponds to the features [26]. In medical three types based on the keywords. research, generation is defned as the biological activity, such as relationship between children and parents, or several ofsprings as a generation with diferences in biological traits 3.2. LDA2vec Topic Model. Judging from the collected texts, [27–29]. Unlike sociology, intergeneration divisions in all kinds of texts contain a lot of redundant and repeated medicine are not refected in the same time interval but rather information. It is impractical and unrealistic to make all in diferent features. Nevertheless, both two types of studies information as generation expressive features. Te LDA2vec are characterized by the fact that evolution produces in- [7] topic model can efectively summarize information from tergeneration. In the feld of transportation, transportation both context and sentence semantics and a group of systems also exhibit evolution characteristics. Guan [30] extracted keywords as appearance features. proposes that ITS can be divided into three generations based LDA2vec is a combination of LDA and word2vec and on the evolutionary law of transportation systems and shows learns word vectors using the skip-gram model along with the future application feature in ITS 3.0. Zhang et al. [13] topic vectors. To be able to meet our needs for text con- propose to output element replacement through the evolution centration, this can mix the global and local keyword fea- relationship between nodes and edges in a complex network tures. Terefore, using LDA2vec as a summary, of the ATS evolution model and then select a replacement as a deredundancy technique is viable. the criterion for intergeneration division. Yu et al. [2] pro- From word2vec, word vector w is encoded from skip- posed an intergeneration division criterion of ATS based on gram architecture [32]. From the LDA model, topic vector key components and gave several indicators as references. t is generated by Dirichlet distribution, as shown in Fig- Intergeneration division has been mostly studied qualita- ure 3. To express sentences, a document vector d is set from tively, especially in the feld of sociology and less in the feld of a mixture of topic vectors. transportation. Based on the ideas proposed by predecessors, we would like to propose a quantitative perspective to study the → d � 􏽘 p t , (1) jk k intergenerational division of transportation systems, whose k�1 features can be found from the relationship between elements. Te topic model can be used to fnd the connection between where 0 ≤ p ≤ 1 is a fraction that denotes the membership jk elements by human being cognition from the record texts and of document j in the topic k. Te context vector c shows the proposes a method based on massive texts. information from context which is defned as follows: 4 Journal of Advanced Transportation Text Set Text Set with Component Word Set K-Means Generation A one item Extraction Module Clustering Component Text Set with Component Word Set Generation B one item Extraction Module Word Set Text Set with Component Word Set Generation C one item Extraction Module Component Word Set Extraction Module Year Distribution Word Distribution Text Set with LDA2vec Topics one item Component Compare Word Subset Component Component Stopword word rounded rectangles are works right -angled rectangles are mediums Figure 2: Overview of our method. In each item, a component word set extraction module is proposed to collect component word set and then clusters its component with year by K-means to get generation feature. An LDA2vec work is used in the component word set extraction module to get the relationship between the component and other features. LDA neg skip-gram Figure 3: LDA2vec is a combination of LDA and word2vec. Te topic vector is generated from the inverse result of the text generation method based on the Dirichlet distribution parameters α and β. With a membership parameter p , the topic vector generates the document vector. Te pivot word vector is obtained from skip-gram in word2vec. We combine the two vectors to get the context vector and compare with the target word vector to get an unsupervised learning result. pivot word vector topic vector Document vector context vector target word vector Journal of Advanced Transportation 5 q p → → (2) c � w + d . s � 􏽘 􏽘 compare w , v , (8) k l l�1 k�1 Te loss function L is defned as follows: neg where compare is the function defned by the python L � L + 􏽘 L , d ij package Synonyms [35]. Synonyms encodes all words into ij diferent [1, 100] vectors, namely, word vectors, and then (3) neg → → → → calculates similarity between two diferent word vectors. Te L � log σ c · w + 􏽘 􏼐 􏼑 log σ􏼐− c · w 􏼑, ij j i j l value of similarity according to Synonyms is between. Te l�0 closer it is to 1, the more similar it is. For each subelement item, we sort all the components according to the re- L � λ 􏽘(α − 1)logp d jk , (4) lationship values and select the top 5 as winners. After jk counting all the words in the text set using the word bag and neg where L is the skip-gram negative sampling loss (SGNS) sorting the words by word frequency, the top fve words are ij [33]. L is the addition of a Dirichlet-likelihood term over sampled as word distribution awarded to the winners. Teir → �→ document weights. w is the target word vector, and w is weights which come from word frequency normalization are i l the negatively sampled word vector. σ(x) is sigmoid. Te also awarded to the winners. Te purpose of this is to strength of this term is modulated by the tuning parameter λ. eliminate irrelevant connections and make the model more α is a constant parameter, which is 1 less than the number of relevant. Te year of all texts in the text set with one sub- topics. element item is counted as a year distribution and also Te total loss L is passed backwards to the training awarded to the winner. When all sets are traversed, each process of LDA2vec and iterated until the loss converges. component has a word set with diferent word weights and After training of the above model, the topic vectors can be a year statistics table from the obtained year distribution. In calculated form the inverse result. Te topic vectors have order to lessen the impact of repeated words on the results, been shown that contain sentence semantic information and the weights of the identical words for each year under each context information. Te topic vectors can be converted into component are added together. After the integration is topic descriptions with words: fnished, a component word set is built. topic � 􏼂w , w , . . . , w 􏼃. (5) 1 2 m 3.4. Clustering. In data collection, we introduce that com- ponent is the key element of ATS intergeneration division. Terefore, generations can be divided by clustering com- 3.3. Component Word Set Extraction Module. Te topic ponents. To integrate components’ information in ATS, description of one subelement item which is related with all a component word set is built. Diferent from component topics in its directory can be defned as follows: description, a component word set is a word set from four topic � topic , topic , . . . , topic , other subelement items’ topics, which mean components are 􏼂 􏼃 item,year 1 2 n (6) described by ATS element relation. However, there is still � 􏽨w , w , . . . , w 􏽩, 1 2 q a missing link, which is the unifed and digital representation of all component word sets. where q � m ∗ n means topic that includes q words. For item To achieve the requirement of clustering, an improved each subelement item, there is a topic description for each K-means [31] clustering model is used to cluster the com- year. Te component description can be defned as c : ponent word set. Te original K-means distance equation is as follows: c � 􏽨v , v , . . . , v 􏽩, (7) i 1 2 p 􏽶���������� � where i is the component index. p is the number of words in D � 􏽘 x 􏽢 − x , (9) i i one component. Diferent components have diferent p. i�1 Te Baidu stop word package [34] is used to split the sentence into words v to v . In experiments, it is found that 1 p the stop word package is limited. Terefore, some words where x is the clustering center of each class. x is the input i i data. n is the number of input dimensions. We consider the that are often used in papers and patents are added to the stop word package to make the stop word impact more dimensions of the input in K-means and the data obtained by the previous method, using a group of words and their noticeable. weights from the same element in one year as one single Te similarity between topics and component de- scriptions can be used to illustrate the similarities between information for K-means;Synonyms [35] integrates the information into a [1, 100] vector. On this basis, considering components and other elements. Each component’s word description vector closeness to the topics of four other el- the word weight and year information, the K-means distance equation is as follows: ements can be used to calculate the following value: 6 Journal of Advanced Transportation 􏽶�������������������������������� system. In the other ingeneration division node 2019, it is the time when autonomous driving system is widely installed in (10) D � k a 􏽘 x 􏽢 − x􏼁 + a 􏼐x 􏽢 − x 􏼑 , word i i year 101 year vehicles and V2X starts to be applied with 5G. i�1 From the top ten words in each cluster in Table 2, in 2010–2015, trafc system is talking about system control and where k is the weight of word. x is one single data from [1, trafc planning. Electronic information technology has 100] vector. x is the year in the word year list. year developed rapidly. Electronic control products are mass produced during this period. Te main applications are 4. Experiments vehicle electronic automatic control and signal light adaptive control, which exhibit automated features. In 2016–2019, 4.1. Data Collection. Te data of policies and reports in the communication, data, and information are mentioned in text dataset are collected on the Internet with crawlers. Te clusters. In this period, 4G network shines in the trans- collection of papers is based on the CNKI paper database. portation system. Te Internet, cloud computing, and big More details about the collected data are in Table 1. Needs data are all on fre, and the transportation system is afected are to describe a vision for the future; therefore, there is no by them. Te faster communication speed increases the collection of papers that can be referenced in this work. To amount of data; therefore, trafc big data is widely used. Te ensure that the amount of data is sufcient to meet the Internet makes the trafc system exhibit the feature of conditions for massive text computation, we collected a total network connection. From 2020 to 2022, coordination is of 200,032 reports, 72,311 policies, and 1,223 papers from mentioned and comes to an important position, which 2010 to 2022. To ensure the quality of the collected papers, means that V2X enters the trafc system. More advanced we counted the source information of the collected papers, sensing devices enable drivers and the transportation system with 15.0% of doctoral dissertations, 37.0% of master’s to access more data and more information; correspondingly, dissertations, 29.1% of papers published in journals that are better edge computing devices are needed. Te trans- part of the Chinese core, and 18.9% of others. Te percentage portation system exhibits the coordination feature. of paper sources is shown in Figure 4, and the year distri- bution of the collected data is shown in Figure 5. 4.3.2. Key Components. For the key components, two types 4.2. Results. In equation (6), the author in [7] suggests of key components are signifcant in the intergeneration λ � 300. In equation (11), in order for the proposed model to division: one of which accounts for the largest portion of learn more about year feature information, a � 1 and word the clustering results and the other of which difers from a � 350. Te number of topics is set as 20. In order to the largest connotation change between the two year allow the model to fully summarize the text set, the number generations. of iterations’ epoch of the topic model is set as 60. As shown Te key components are found in the K-means in Figure 6, the total loss reaches a low number for the model clustering result. In each class of the K-means cluster- to converge, when the skip-gram negative sampling loss ing results, all components are associated with a set of (SGNS) and Dirichlet-likelihood loss reach a reasonable words, which are the outcomes of the aforementioned level of overftting. With the result in K-means clustering, classifcation of the component word set into generations. the result vector must be [1, 101]. A visualize parameter y is Each word also includes a word weight k, which indicates used to show the frst hundred parameters of the result how much that word contributed to the clustering out- vector as y label, which means the word vector: come. To obtain how much each component contributed to the clustering outcome, the weights of all the words in that component are added. Table 3 shows the three y � 􏽘􏼂 log x􏼁􏼃 , (11) r i i�1 components with the highest contribution in each gen- eration because the development of transportation where r � 50 is a constant parameter. X label is the number technology in the last decade has been most concentrated of years. Te result is shown in Figure 7. on the transformation of transport vehicle, the in- In Figure 7, among 3 types of clustering results, the three novation of station infrastructure, and the improvement generations are in 2010–2015, 2016–2019, and 2020–2022. of driver comfort. Tese three elements are also present in Te frst intergeneration change occurred in 2015 and the varying degrees in transportation research. Among them, second in 2019. current research has focused heavily on automobile technology. Te scientifc community and the private sector are very much interested in automatic driving, 4.3. Discussion vehicle control, vehicle-side edge computing, and energy 4.3.1. Time and Generation Features. In Guan’s research structure change. Since transport vehicle is where the [30], they believe that the transportation system evolved transportation system change is most visibly present, from ITS1.0 to ITS2.0 in 2015, and with the development of transport vehicle has played a signifcant role in shaping automation and intelligence, it will evolve into the next that change. With the improvement of information technology, besides transport vehicle, station in- generation. Our results support this conclusion that 2015 is a time node for the transformation of the transportation frastructure is the most signifcant change in ITS Journal of Advanced Transportation 7 Table 1: Available data and sources are as follows. Source Technology Service Function Need Papers CNKI ✓ ✓ ✓ Reports Chinese government website ✓ ✓ ✓ ✓ Policies Baidu information ✓ ✓ ✓ ✓ Master's Dissertation (37.0%) Doctoral Dissertation (15.0%) Others (18.9%) Papers from Chinese Core Journals (29.1%) Figure 4: In order to ensure the quality of papers, when collecting data, doctoral dissertations, Master’s dissertations, and papers from Chinese core journals need to account for the majority. 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Year reports policies papers Figure 5: Te data are arranged by year, showing the number of texts consisting of reports, policies, and papers for each year. infuenced by technology. In recent years, the technology has also been signifcantly improved, and the level of of bus stations, high-speed railway stations, airports, and autonomous driving is based on the tasks that the driver other infrastructures for the interaction of incoming and needs to perform. Te generation 2020–2022 is the period outgoing carriers has made tremendous progress. Ad- of high development of autonomous driving, which is the ditionally, there are new stations that refect the most reason for the increase of the percentage of driver. cutting-edge infrastructure technologies, such as smart On the other hand, the component with the greatest parking lots and charging stations. In addition, a driver change in the two adjacent generations also plays a key role also plays a signifcant part in intergeneration division, as in the intergeneration transition. Generations may change driving jobs are changing at this time due to the devel- with these components. Te result shows the top two opment of autonomous driving. As shown in Table 3 and greatest components in Table 4. Tis result is in line with the Figure 8, we note that the percentage of driver is gradually generation features mentioned previously. From 2010–2015 rising, which is related to the improvement of automation to 2016–2019, trafc managers have changed from manual to and driving comfort for driver operation in recent years. automatic, and passenger vehicles begin to connect to the Te current stage of vigorous development of vehicle- Internet. From 2016–2019 to 2020–2022, V2X enables line road collaboration and autonomous driving aims to devices and edge devices to be updated. Both auxiliary liberate drivers, whose role in the transportation system operating devices and roadside smart devices support generation change. has been changing. In addition to the driver, experience Number of Reports (policies) Number of Papers 8 Journal of Advanced Transportation 10 20 30 40 50 60 Epochs Figure 6: Te data in this fgure show the average total loss in every epoch. Te LDA2vec model can converge at 60 epochs. 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 –0.2 2010 2012 2014 2016 2018 2020 2022 Year Figure 7: Te clustering result shows that among 3-class, K-means can divide an accurate timeline. Table 2: Sorted by the weight of words, the top ten words of each Table 3: Sorted by the weight of words, the top three components cluster are shown. of each generation are shown. Clusters Top ten words Cluster Top three greatest components Vehicle, driving, technology, system, transportation, Transport vehicle, 2010–2015 2010–2015 development, control, automatic, construction, and station infrastructure, and driver planning Transport vehicle, 2016–2019 Technology, vehicle, trafc, control, system, station infrastructure, and driver 2016–2019 information, management, communication, city, and Transport vehicle, 2020–2022 data driver, and station infrastructure Development, vehicle, construction, driving, coordination, Te year division also shows a clear line, but in 2020–2022 transportation, autonomous, testing, information, 2018–2020, a generation is defned, which is difcult and device to explain in practical cognition. Terefore, it is not suitable to cluster into 4 classes, and it also proves the efectiveness of clustering into 3 classes. In addition, if we 4.3.3. Clustering Parameters. 3-class is the best clustering cluster more than 4 classes or less than 3 classes, a single point will become a class by itself, so it is also discarded result. In order to verify this, the clustering results of class 4 are shown in Figure 9. by us. Visualize parameter (y) Loss Journal of Advanced Transportation 9 0.1515 0.1905 0.2125 0.1242 0.5541 0.5772 0.5195 0.1313 0.1456 0.1471 0.1241 0.1224 2010-2015 2016-2019 2020-2022 Transport Vehicle Transport Vehicle Transport Vehicle Station infrastructure Station infrastructure Station infrastructure Driver Driver Driver Others Others Others Figure 8: Transport vehicle, station infrastructure, and drivers are all in the top three components. Te driver becomes the second component in 2020–2022. Table 4: Sorted by the weight of words, the top two components of 5. Conclusion connotation change between the two generations. In this work, a novel intergeneration division method based Generation change Top two greatest components on text big data is proposed, and it successfully fnds the time Trafc manager and From 2010–2015 to 2016–2019 nodes and key components of ATS intergeneration division passenger vehicle in massive text data. Our method searches the text records of Auxiliary operating device and massive ATS elements by keywords and uses LDA2vec to From 2016–2019 to 2020–2022 roadside smart device extract the text set topics of ATS subelements, which help to calculate the relationship between other elements and 1.25 components. Ten, the word distribution and the year distribution are fused through the element relationship to 1.00 obtain the word set of the components. Finally, we improve K-means, cluster it into the generation we need, and cal- 0.75 culate the key components of the generation. Te experi- 0.50 mental results show that both the time node of intergeneration division and the results of key components 0.25 are in line with realistic expectations. In the Discussion section, this study discusses the lim- 0.00 itations of this work. What our method learns is feature -0.25 relationship based on textual features, and there is a gap between them and reality. In addition, the generational line -0.50 should be fuzzy due to engineering project, and the clear year line we get should actually be a fuzzy border. 2010 2012 2014 2016 2018 2020 2022 In future work, we will consider the actual in- Year tergeneration relationship and complete the task of in- Figure 9: Te clustering result shows that among 4-class. tergeneration division. Based on the evolution model [13], we will try to learn the feature of future generations, divide 4.3.4. Disadvantage. Our method actually searches for the possible future generations, and give planning advice for connections between elements from textual big data. In fact, future transportation development. it is about learning from what has been documented before, rather than actually having a connection. Te real con- nection should be in industrial projects. Although this Data Availability method can learn the implicit relationship between ele- Te data used to support the fndings of this study can be ments, it is hard to know whether the implicit relationship obtained from the corresponding author upon request at exists in the real world. xiongch8@mail.sysu.edu.cn. On the one hand, generations are actually fuzzy because engineering applications take time, which is always several years. Te clear boundary obtained by our method should Conflicts of Interest actually be a fuzzy value. Te time nodes of the in- tergeneration division should be around 2015 and 2019. Te authors declare that they have no conficts of interest. Visualize parameter (y) 10 Journal of Advanced Transportation Conference of ITS China(ITSAC2021), pp. 141–148, Tianjin, Acknowledgments China, Augest 2021. [14] G. Xu, X. Liu, L. Zhong, Y. Xiao, and H. Huang, “Reliability Tis work was supported by the National Key Research analysis and evaluation of logical architecture for autonomous and Development Program of China under grant transportation system,” Journal of Railway Science and En- no. 2020YFB1600400, the National Natural Science gineering, vol. 19, no. 10, 2022. Foundation of China under grant no. 12002403, and the [15] Z. Deng, C. Xiong, and M. Cai, “Research on theoretical Shenzhen Postdoctoral Research Fund under grant no. model and construction method of the physical object for SZBH202101. autonomous transportation system,” in Proceedings of the 22nd COTA International Conference of Transportation, pp. 647–655, Reston, VA USA, November 2022. References [16] Z.-S. Zhou, M. Cai, C. Xiong, Z. Deng, and Y. Yu, “Con- [1] Z. Deng, C. Xiong, and M. Cai, “An autonomous trans- struction of autonomous transportation system architecture portation system architecture mapping relation generation based on system engineering methodology,” in Proceedings of method based on text analysis,” IEEE Transactions on Com- the 2022 IEEE 25th International Conference on Intelligent putational SocialSystems, vol. 9, no. 6, pp. 1768–1776, 2022. Transportation Systems (ITSC), pp. 3348–3353, Macau, China, [2] Y. Yu, C. Gou, C. Xiong, and M. Cai, “Research on generation November 2022. defnition method of autonomous transportation system [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent based on key trafc components,” in Proceedings of the 22nd dirichlet allocation,” Journal of Machine Learning Research, COTA International Conference of Transportation, pp. 550– vol. 3, no. 1, pp. 993–1022, 2003. 557, Changsha, China, July 2022. [18] J. Weng, E.-P. Lim, J. Jiang, and H. Qi, “Twitterrank: fnding [3] Ertrac Task Force, Automated Driving Roadmap, European topic-sensitive infuential twitterers,” in Proceedings of the road Transport Research Advisory Council, Brussel, Belgium, third ACM international conference on Web search and data 2015. mining, pp. 261–270, New York, NY, USA, February 2010. [4] State Administration for Market Regulation and Standardi- [19] J. Bian, Z. Jiang, and Q. Chen, “Research on multi-document zation Administration of the People’s Republic of China, summarization based on lda topic model,” in Proceedings of “Taxonomy of driving automation for vehicles,” 2021, https:// the 2014 Sixth International Conference on Intelligent Human- www.researchgate.net/fgure/Taxonomy-of-driving-automati Machine Systems and Cybernetics, vol. 2, pp. 113–116, on_tbl1_332944356. Hangzhou, China, Augest 2014. [5] X. Yang, Y. Zou, and L. Chen, “Operation analysis of freeway [20] Y. Luo and H. Shi, “Using lda2vec topic modeling to identify mixed trafc fow based on catch-up coordination platoon,” latent topics in aviation safety reports,” in Proceedings of the Accident Analysis & Prevention, vol. 175, 2022 https://www. 2019 IEEE/ACIS 18th International Conference on Computer sciencedirect.com/science/article/pii/S0001457522002159, and Information Science (ICIS), pp. 518–523, Beijing, China, Article ID 106780. June 2019. [6] Y. Zou, L. Ding, H. Zhang, T. Zhu, and L. Wu, “Vehicle [21] D. Wahhab Alkhafaji and S. Al-Rashid, “A topic modeling for clustering Arabic documents,” in Proceedings of the 2021 2nd acceleration prediction based on machine learning models and driving behavior analysis,” Applied Sciences, vol. 12, Information Technology To Enhance e-learning and Other no. 10, p. 5259, 2022, https://www.mdpi.com/2076-3417/12/ Application (IT-ELA), pp. 76–81, IEEE, Baghdad, Iraq, May 10/5259. 2021. [7] C. E. Moody, “Mixing dirichlet topic models and word em- [22] W. Feng, X. Nie, Y. Zhang, Z.-Q. Liu, and J. Dang, “Story co- beddings to make lda2vec,” 2016, https://arxiv.org/abs/1605. segmentation of Chinese broadcast news using weakly- 02019. supervised semantic similarity,” Neurocomputing, vol. 355, [8] T. Lusco, “Arc-it–the architecture reference for cooperative pp. 121–133, 2019. and intelligent transportation,” US Department of Trans- [23] H. Xia, W. An, J. Li, and Z. J. Zhang, “Outlier knowledge management for extreme public health events: understanding portation, Washington, DC, USA, 2018. [9] H. Y. Yuan, F. H. Liu, and K. Zhang, “Te key skill of de- public opinions about covid-19 based on microblog data,” velopment and research on intelligent transportation system Socio-Economic Planning Sciences, vol. 80, Article ID 100941, (its) architecture in China,” Technology and Economy in Areas 2022. of Communications, vol. 4, pp. 88–90, 2008. [24] P. P. Widagdo, T. D. Susanto, and H. J. Setyadi, “Te infuence [10] N. Gabriel, “Development and standardization of intelligent of user generation diferences on individual performance in transport systems,” International Journal on Marine Navi- using information technology,” in Journal of Physics: Con- gation and Safety of Sea Transportation, vol. 6, no. 3, ference Seriesvol. 1803, Bristol, England, IOP Publishing, pp. 403–411, 2012. Article ID 012031, 2021. [11] M. Mochizuki, A. Suzuki, and T. Tajima, “Utms system ar- [25] G. Cezard, ´ N. Finney, H. Kulu, and A. Marshall, “Ethnic chitecture,” in Proceedings of the 6th World Congress On diferences in self-assessed health in scotland: the role of Intelligent Transport Systems (Its), Held Toronto, Canada, socio-economic status and migrant generation,” Population, Washington, DC, USA, November 1999. Space and Place, vol. 28, no. 3, p. 2403, 2022. [12] W. Wei, L. Zheng, and M. Cai, “Research on user of intelligent [26] C. Li, “Intergeneration division and employment choices of transportation system for autonomous transportation,” migrant workers in the age-laborskill perspective,” China Technology and Economy in Areas of communications, vol. 2, Youth Study, vol. 7, no. 9, 2022. no. 24, 2022. [27] S. W. Park, K. Sun, S. Abbott et al., “Inferring the diferences [13] L. Zhang, Y. Xiao, and S. Jiang, “A study on the evolution in incubation-period and generation-interval distributions of model of autonomous transportation system based on the delta and omicron variants of sars-cov-2,” 2022, https:// complex network,” in Proceedings of the 16th Te Annual www.researchgate.net/publication/361825703_Inferring_the Journal of Advanced Transportation 11 _diferences_in_incubation-period_and_generation-interval _distributions_of_the_Delta_and_Omicron_variants_of_SA RS-CoV-2. [28] M. Coll Macia, ` L. Skov, B. M. Peter, and M. H. Schierup, “Diferent historical generation intervals in human pop- ulations inferred from neanderthal fragment lengths and mutation signatures,” Nature Communications, vol. 12, no. 1, p. 5317, 2021. [29] L. Gao, F. Du, J. Wang et al., “Examination of the diferences between sulforaphane and sulforaphene in colon cancer: a study based on next-generation sequencing,” Oncology Letters, vol. 22, no. 4, p. 690, 2021. [30] J. Guan, “Intelligent transportation system development evolution and its generation feature,” AI-View, vol. 4, pp. 40–49, 2022. [31] K. Krishna and M. Narasimha Murty, “Genetic kmeans al- gorithm,” IEEE Transactions on Systems, Man, and Cyber- netics, Part B (Cybernetics), vol. 29, no. 3, pp. 433–439, 1999. [32] C. McCormick, “Word2vec tutorial-the skipgram model,” 2016, http://mccormickml.com/2016/04/19/word2vec- tutorial-the-skip-gram-model/. [33] S. Stergiou, Z. Straznickas, R. Wu, and K. Tsioutsiouliklis, “Distributed negative sampling for word embeddings,” in Proceedings of the TirtyFirst AAAI Conference on Artifcial Intelligence, Palo Alto, CA USA, February 2017. [34] Q. Guan, S. Deng, and H. Wang, “Chinese stopwords for text clustering: a comparative study,” Data Analysis and Knowl- edge Discovery, vol. 1, no. 3, pp. 72–80, 2006. [35] H. Ying, X. Hai, and L. Wang, “Chinese synonym toolkit: Synonyms,” 2017, https://github.com/chatopera/Synonyms. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Advanced Transportation Hindawi Publishing Corporation

Intergeneration Division Based on Key Component Analysis in an Autonomous Transportation System Using the Natural Language Processing Method

Loading next page...
 
/lp/hindawi-publishing-corporation/intergeneration-division-based-on-key-component-analysis-in-an-inS2aArG5d

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
ISSN
0197-6729
eISSN
2042-3195
DOI
10.1155/2023/5850876
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Advanced Transportation Volume 2023, Article ID 5850876, 11 pages https://doi.org/10.1155/2023/5850876 Research Article Intergeneration Division Based on Key Component Analysis in an Autonomous Transportation System Using the Natural Language Processing Method 1,2 1,2 1,2 Yuezhao Yu, Chao Gou , and Chen Xiong School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China Guangdong Provincial Key Laboratory of Intelligent Transportation System, School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 510275, China Correspondence should be addressed to Chen Xiong; xiongch8@mail.sysu.edu.cn Received 14 October 2022; Revised 10 February 2023; Accepted 23 February 2023; Published 16 March 2023 Academic Editor: Yajie Zou Copyright © 2023 Yuezhao Yu et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Advancement of emerging technologies and increasing of transport demands accelerate the evolution of the autonomous transportation system (ATS). Framework and architecture of ATS are becoming a research hotspot; however, by far, few studies on transportation intergeneration division are not basically involved. Previous works indicate that key components are critical representation in the distinguishing of long-term era. Besides, massive text material accumulates as the research work goes on, and natural language processing technique keeps developing, which makes quantitative research on key components in in- tergeneration division become possible. In this work, a method based on the massive text analysis is proposed. First, the LDA2vec is used to get the relationship between components and other elements. Ten, a word set is from the component word set extraction module based on component items. Finally, the component word set is clustered to get ATS generation and to generate key components. Based on an analysis of large-scale important trafc texts, our method divides the trafc system into three generations for Chinese trafc from 2010 to 2022. Te key components of our method given are consistent with human cognition of ATS. Successful application indicates that this work can be extended to other intergeneration division felds. big enough to revolute ATS acknowledge, ATS will tran- 1. Introduction sition a new generation. Intergeneration division is proposed Autonomous transportation system (ATS) is a complex to divide the framework features into intergenerational trafc system that focuses on future development direction groups. It clusters the features along the timeline to ag- in the intelligent trafc system (ITS) framework. According gregate diferent stages. Tis work provides a clearer un- to the ATS theory, the trafc system consists of fve basic derstanding about framework acknowledge and promotes elements, namely, technology, need, service, function, and research on framework. Furthermore, the transportation system has been rapidly afected by high technology in component [1, 2]. As a stable framework, ATS unifes the interior and exterior of transportation system. Te external recent years. Artifcial intelligence and connected vehicles efect is driven by technologies and needs, the internal efect are continuously integrated into the transportation feld, is activated by services and functions, and components carry making modern transportation more intelligent compared the whole ATS as physical entities [3, 4], the ATS framework to traditional transportation [5, 6]. Te way industries has periods in the development process. Due to the external produce and the way people travel are changing at an driving efect, the internal activation efect and the carrying accelerated rate. If we are aware of the changes in each efect are composition and performance of the fve elements period, we can learn about how the ATS elements evolute. that will change between diferent periods. When the efect is Tis helps set development goals for the system framework 2 Journal of Advanced Transportation (2) Physical entities are used to perform ATS in- and adjust trafc industrial structure and helps generate huge economic benefts. tergeneration division from a more specifc view, and a method for determining key components of in- Since the 1990s, in theory and technology research on ITS, scholars have not clearly defned about ITS in- tergeneration division is proposed tergeneration division. Instead, they use continuous design to advance the ITS theory and technology. In the research of 2. Related Work diferent scholars, the intergeneration division is considered as a work based on experience. Tis work is relatively 2.1. Autonomous Transportation System (ATS). Te devel- subjective and vague and has advantages when the trans- opment of the ATS framework is based on ITS. In 1992, the portation system is simple, and there has not been much ITS system framework was frst established in the advancement in technology. With the explosive emergence United States by ARC-IT [8], which defned user needs and of modern transportation technologies, the transportation services, logical architecture, physical architecture, and system is more difcult to level in a subjective way. A policies as its work content. Since then, the ITS system method of intergenerational division that divides the framework has been proposed by China [9], the European transportation system’s development process is becoming Union [10], and Japan [11] in accordance with their re- more and more necessary. Te existing intergeneration spective nations. However, the ITS framework requires more division methods about system are difcult to systematically details to improve in present. Big data, autonomous driving, divide the ATS. Te specifc manifestations are as follows: vehicle-road cooperation, 5G, and other technologies are used in transportation. Te role of human beings as decision (1) Tere are diferent element relationships between makers in transportation is being replaced by artifcial in- ATS and other systems. Other intergeneration di- telligence. Te ATS framework for reducing human in- vision methods cannot adapt to ATS. tervention is proposed. In order to liberate human beings (2) No established standard for the intergeneration di- from driving, researchers deconstruct ATS. Tey propose vision method and ways to intergeneration division a method to self-organizingly evolve needs, services, func- are ambiguous and inconsistent. tions, technologies, and components, which is the ATS el- (3) Te key of ingeneration division is not specifc. Te ement framework. Adaptive adjustment is used in this key of division is not based on physical entities. Te framework to ofer theoretical support. ATS is still in the generation features cannot be clearly described. exploratory stage; therefore, most of the ATS research is theoretical. Wei et al. [12] study the ATS needs and propose In order to solve the above problems, [2] we propose an a user need system construction method. Zhang et al. [13] ATS intergeneration division standard based on key com- use Petri nets to build a relationship network between el- ponents, which uses physical entities as a bridge to refect ements and model the evolution of ATS. Te relationship changes between generations. All elements depend on between its ATS elements is shown in Figure 1, which is an components for expression. Tis concept is used in this important theoretical basis for ATS architecture. Deng et al. work. In this work, an ATS intergeneration division method [1] build an architecture mapping relation using text is proposed based on big data in our work. Based on the analysis. Xu et al. [14] develop an analysis of ATS logical LDA2vec topic model [7] in natural language processing architecture and prove the reliability of ATS logical archi- (NLP), topics are extracted from the divided text set as tecture. By ATS physical architecture, Deng et al. [15] defne condensed text content. Do text similarity comparisons with the physical architecture of ATS and form the theoretical topics and components to quantify the relationship between basis. Zhou et al. [16] build the physical architecture of ATS other elements and components. Te word distribution and and create the groundwork for application. the year distribution from the text set and the values form the relationships that commonly constitute the component word set and then cluster to get the desired generation. 2.2. Topic Model. In 2003, Blei et al. [17] propose the Latent Finally, the key components are selected according to the Dirichlet Allocation (LDA) topic model, which provided the clustering results. In this process, components are used as topic from text set with a probability distribution. Tis physical entities to connect the entire system framework, technique is able to summarize multiple latent topics from and information to establish intergeneration division also text collection into a small number of words. After that, depends on components. Besides, the selected components numerous LDA-based natural language processing (NLP) include feature diferences between two generations. studies were conducted. Weng et al. [18] use LDA to im- Terefore, the selected components are representative. Te plement opinion monitoring analysis on Twitter. Bian et al. main contributions are as follows: [19] use LDA for multidocument summarization. Since LDA (1) A machine learning method based on the LDA2vec is a topic model based on bag-of-words, it can efectively model is proposed for ATS intergeneration di- summarize contextual macro information. However, for vision from an objective view with a group of microsentential information, LDA does not work well. In massive texts order to address this issue, Moody et al. [7] combine word Journal of Advanced Transportation 3 service 3. Proposed Method layer In Figure 2, the process of our method is shown. Our needs intention is to divide the transportation system devel- component opment process into several time periods, with genera- layer technologies tion feature. Te goal of the model is to extract relationships between other elements and components function from the massive text. Te collected data are organized by layer subelement items. Te component word set extraction Figure 1: Te fve elements of ATS work as shown in the fgure. module with LDA2vec is proposed to calculate the Needs and technologies as external drivers, functions and services component word set, whose word set consists of word as internal infuences, and components as physical carriers, the distribution and year distribution. After using KMeans system is organized as a model. [31] to cluster the word set, fnally, the generations and their features are generated. vectors and LDA to create LDA2vec, which can integrate context information and sentence information to get topics. Using LDA2vec, Luo and Shi [20] collect information in 3.1. Data Collection. Te authors in [2] demonstrate that aviation safety reports and successfully complete the mining component is the key element of ATS intergeneration di- latent topics task. In [21–23], the topic model is used in vision because the remaining four elements (technology, diferent languages and gets the same satisfactory results. need, service, and function) are based on component as a carrier. Te objective in this subsection is to collect the texts for these four aspects and determine how they relate to 2.3. Intergeneration Division. Intergeneration division is each other and to components. In order to obtain a complete a sociological concept. Many works defne social group dif- intergeneration division, the granularity of data collection ferences as generation and perform division analysis from it. needs to be accurate to the year and calculate the re- Research is broadly divided into two types, the frst is the lationships between the component and other elements for division of predetermined years to examine intergeneration each year [2]. Additionally, it breaks down the ATS gen- variations within the same time period, for example, treating eration evaluation methodology into six indicators. Com- every decade as a generation to study changes in human bined with cognition, the ATS generation texts are divided behaviour [24, 25]. Te alternative method does not rely on into three types: papers, reports, and policies. Te keywords time. Instead, it categorizes individuals who satisfy certain of each item in four elements are manually fltered out. Te criteria and then divides generations based on the temporal texts that make up our training dataset are retrieved into aggregation that corresponds to the features [26]. In medical three types based on the keywords. research, generation is defned as the biological activity, such as relationship between children and parents, or several ofsprings as a generation with diferences in biological traits 3.2. LDA2vec Topic Model. Judging from the collected texts, [27–29]. Unlike sociology, intergeneration divisions in all kinds of texts contain a lot of redundant and repeated medicine are not refected in the same time interval but rather information. It is impractical and unrealistic to make all in diferent features. Nevertheless, both two types of studies information as generation expressive features. Te LDA2vec are characterized by the fact that evolution produces in- [7] topic model can efectively summarize information from tergeneration. In the feld of transportation, transportation both context and sentence semantics and a group of systems also exhibit evolution characteristics. Guan [30] extracted keywords as appearance features. proposes that ITS can be divided into three generations based LDA2vec is a combination of LDA and word2vec and on the evolutionary law of transportation systems and shows learns word vectors using the skip-gram model along with the future application feature in ITS 3.0. Zhang et al. [13] topic vectors. To be able to meet our needs for text con- propose to output element replacement through the evolution centration, this can mix the global and local keyword fea- relationship between nodes and edges in a complex network tures. Terefore, using LDA2vec as a summary, of the ATS evolution model and then select a replacement as a deredundancy technique is viable. the criterion for intergeneration division. Yu et al. [2] pro- From word2vec, word vector w is encoded from skip- posed an intergeneration division criterion of ATS based on gram architecture [32]. From the LDA model, topic vector key components and gave several indicators as references. t is generated by Dirichlet distribution, as shown in Fig- Intergeneration division has been mostly studied qualita- ure 3. To express sentences, a document vector d is set from tively, especially in the feld of sociology and less in the feld of a mixture of topic vectors. transportation. Based on the ideas proposed by predecessors, we would like to propose a quantitative perspective to study the → d � 􏽘 p t , (1) jk k intergenerational division of transportation systems, whose k�1 features can be found from the relationship between elements. Te topic model can be used to fnd the connection between where 0 ≤ p ≤ 1 is a fraction that denotes the membership jk elements by human being cognition from the record texts and of document j in the topic k. Te context vector c shows the proposes a method based on massive texts. information from context which is defned as follows: 4 Journal of Advanced Transportation Text Set Text Set with Component Word Set K-Means Generation A one item Extraction Module Clustering Component Text Set with Component Word Set Generation B one item Extraction Module Word Set Text Set with Component Word Set Generation C one item Extraction Module Component Word Set Extraction Module Year Distribution Word Distribution Text Set with LDA2vec Topics one item Component Compare Word Subset Component Component Stopword word rounded rectangles are works right -angled rectangles are mediums Figure 2: Overview of our method. In each item, a component word set extraction module is proposed to collect component word set and then clusters its component with year by K-means to get generation feature. An LDA2vec work is used in the component word set extraction module to get the relationship between the component and other features. LDA neg skip-gram Figure 3: LDA2vec is a combination of LDA and word2vec. Te topic vector is generated from the inverse result of the text generation method based on the Dirichlet distribution parameters α and β. With a membership parameter p , the topic vector generates the document vector. Te pivot word vector is obtained from skip-gram in word2vec. We combine the two vectors to get the context vector and compare with the target word vector to get an unsupervised learning result. pivot word vector topic vector Document vector context vector target word vector Journal of Advanced Transportation 5 q p → → (2) c � w + d . s � 􏽘 􏽘 compare w , v , (8) k l l�1 k�1 Te loss function L is defned as follows: neg where compare is the function defned by the python L � L + 􏽘 L , d ij package Synonyms [35]. Synonyms encodes all words into ij diferent [1, 100] vectors, namely, word vectors, and then (3) neg → → → → calculates similarity between two diferent word vectors. Te L � log σ c · w + 􏽘 􏼐 􏼑 log σ􏼐− c · w 􏼑, ij j i j l value of similarity according to Synonyms is between. Te l�0 closer it is to 1, the more similar it is. For each subelement item, we sort all the components according to the re- L � λ 􏽘(α − 1)logp d jk , (4) lationship values and select the top 5 as winners. After jk counting all the words in the text set using the word bag and neg where L is the skip-gram negative sampling loss (SGNS) sorting the words by word frequency, the top fve words are ij [33]. L is the addition of a Dirichlet-likelihood term over sampled as word distribution awarded to the winners. Teir → �→ document weights. w is the target word vector, and w is weights which come from word frequency normalization are i l the negatively sampled word vector. σ(x) is sigmoid. Te also awarded to the winners. Te purpose of this is to strength of this term is modulated by the tuning parameter λ. eliminate irrelevant connections and make the model more α is a constant parameter, which is 1 less than the number of relevant. Te year of all texts in the text set with one sub- topics. element item is counted as a year distribution and also Te total loss L is passed backwards to the training awarded to the winner. When all sets are traversed, each process of LDA2vec and iterated until the loss converges. component has a word set with diferent word weights and After training of the above model, the topic vectors can be a year statistics table from the obtained year distribution. In calculated form the inverse result. Te topic vectors have order to lessen the impact of repeated words on the results, been shown that contain sentence semantic information and the weights of the identical words for each year under each context information. Te topic vectors can be converted into component are added together. After the integration is topic descriptions with words: fnished, a component word set is built. topic � 􏼂w , w , . . . , w 􏼃. (5) 1 2 m 3.4. Clustering. In data collection, we introduce that com- ponent is the key element of ATS intergeneration division. Terefore, generations can be divided by clustering com- 3.3. Component Word Set Extraction Module. Te topic ponents. To integrate components’ information in ATS, description of one subelement item which is related with all a component word set is built. Diferent from component topics in its directory can be defned as follows: description, a component word set is a word set from four topic � topic , topic , . . . , topic , other subelement items’ topics, which mean components are 􏼂 􏼃 item,year 1 2 n (6) described by ATS element relation. However, there is still � 􏽨w , w , . . . , w 􏽩, 1 2 q a missing link, which is the unifed and digital representation of all component word sets. where q � m ∗ n means topic that includes q words. For item To achieve the requirement of clustering, an improved each subelement item, there is a topic description for each K-means [31] clustering model is used to cluster the com- year. Te component description can be defned as c : ponent word set. Te original K-means distance equation is as follows: c � 􏽨v , v , . . . , v 􏽩, (7) i 1 2 p 􏽶���������� � where i is the component index. p is the number of words in D � 􏽘 x 􏽢 − x , (9) i i one component. Diferent components have diferent p. i�1 Te Baidu stop word package [34] is used to split the sentence into words v to v . In experiments, it is found that 1 p the stop word package is limited. Terefore, some words where x is the clustering center of each class. x is the input i i data. n is the number of input dimensions. We consider the that are often used in papers and patents are added to the stop word package to make the stop word impact more dimensions of the input in K-means and the data obtained by the previous method, using a group of words and their noticeable. weights from the same element in one year as one single Te similarity between topics and component de- scriptions can be used to illustrate the similarities between information for K-means;Synonyms [35] integrates the information into a [1, 100] vector. On this basis, considering components and other elements. Each component’s word description vector closeness to the topics of four other el- the word weight and year information, the K-means distance equation is as follows: ements can be used to calculate the following value: 6 Journal of Advanced Transportation 􏽶�������������������������������� system. In the other ingeneration division node 2019, it is the time when autonomous driving system is widely installed in (10) D � k a 􏽘 x 􏽢 − x􏼁 + a 􏼐x 􏽢 − x 􏼑 , word i i year 101 year vehicles and V2X starts to be applied with 5G. i�1 From the top ten words in each cluster in Table 2, in 2010–2015, trafc system is talking about system control and where k is the weight of word. x is one single data from [1, trafc planning. Electronic information technology has 100] vector. x is the year in the word year list. year developed rapidly. Electronic control products are mass produced during this period. Te main applications are 4. Experiments vehicle electronic automatic control and signal light adaptive control, which exhibit automated features. In 2016–2019, 4.1. Data Collection. Te data of policies and reports in the communication, data, and information are mentioned in text dataset are collected on the Internet with crawlers. Te clusters. In this period, 4G network shines in the trans- collection of papers is based on the CNKI paper database. portation system. Te Internet, cloud computing, and big More details about the collected data are in Table 1. Needs data are all on fre, and the transportation system is afected are to describe a vision for the future; therefore, there is no by them. Te faster communication speed increases the collection of papers that can be referenced in this work. To amount of data; therefore, trafc big data is widely used. Te ensure that the amount of data is sufcient to meet the Internet makes the trafc system exhibit the feature of conditions for massive text computation, we collected a total network connection. From 2020 to 2022, coordination is of 200,032 reports, 72,311 policies, and 1,223 papers from mentioned and comes to an important position, which 2010 to 2022. To ensure the quality of the collected papers, means that V2X enters the trafc system. More advanced we counted the source information of the collected papers, sensing devices enable drivers and the transportation system with 15.0% of doctoral dissertations, 37.0% of master’s to access more data and more information; correspondingly, dissertations, 29.1% of papers published in journals that are better edge computing devices are needed. Te trans- part of the Chinese core, and 18.9% of others. Te percentage portation system exhibits the coordination feature. of paper sources is shown in Figure 4, and the year distri- bution of the collected data is shown in Figure 5. 4.3.2. Key Components. For the key components, two types 4.2. Results. In equation (6), the author in [7] suggests of key components are signifcant in the intergeneration λ � 300. In equation (11), in order for the proposed model to division: one of which accounts for the largest portion of learn more about year feature information, a � 1 and word the clustering results and the other of which difers from a � 350. Te number of topics is set as 20. In order to the largest connotation change between the two year allow the model to fully summarize the text set, the number generations. of iterations’ epoch of the topic model is set as 60. As shown Te key components are found in the K-means in Figure 6, the total loss reaches a low number for the model clustering result. In each class of the K-means cluster- to converge, when the skip-gram negative sampling loss ing results, all components are associated with a set of (SGNS) and Dirichlet-likelihood loss reach a reasonable words, which are the outcomes of the aforementioned level of overftting. With the result in K-means clustering, classifcation of the component word set into generations. the result vector must be [1, 101]. A visualize parameter y is Each word also includes a word weight k, which indicates used to show the frst hundred parameters of the result how much that word contributed to the clustering out- vector as y label, which means the word vector: come. To obtain how much each component contributed to the clustering outcome, the weights of all the words in that component are added. Table 3 shows the three y � 􏽘􏼂 log x􏼁􏼃 , (11) r i i�1 components with the highest contribution in each gen- eration because the development of transportation where r � 50 is a constant parameter. X label is the number technology in the last decade has been most concentrated of years. Te result is shown in Figure 7. on the transformation of transport vehicle, the in- In Figure 7, among 3 types of clustering results, the three novation of station infrastructure, and the improvement generations are in 2010–2015, 2016–2019, and 2020–2022. of driver comfort. Tese three elements are also present in Te frst intergeneration change occurred in 2015 and the varying degrees in transportation research. Among them, second in 2019. current research has focused heavily on automobile technology. Te scientifc community and the private sector are very much interested in automatic driving, 4.3. Discussion vehicle control, vehicle-side edge computing, and energy 4.3.1. Time and Generation Features. In Guan’s research structure change. Since transport vehicle is where the [30], they believe that the transportation system evolved transportation system change is most visibly present, from ITS1.0 to ITS2.0 in 2015, and with the development of transport vehicle has played a signifcant role in shaping automation and intelligence, it will evolve into the next that change. With the improvement of information technology, besides transport vehicle, station in- generation. Our results support this conclusion that 2015 is a time node for the transformation of the transportation frastructure is the most signifcant change in ITS Journal of Advanced Transportation 7 Table 1: Available data and sources are as follows. Source Technology Service Function Need Papers CNKI ✓ ✓ ✓ Reports Chinese government website ✓ ✓ ✓ ✓ Policies Baidu information ✓ ✓ ✓ ✓ Master's Dissertation (37.0%) Doctoral Dissertation (15.0%) Others (18.9%) Papers from Chinese Core Journals (29.1%) Figure 4: In order to ensure the quality of papers, when collecting data, doctoral dissertations, Master’s dissertations, and papers from Chinese core journals need to account for the majority. 0 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Year reports policies papers Figure 5: Te data are arranged by year, showing the number of texts consisting of reports, policies, and papers for each year. infuenced by technology. In recent years, the technology has also been signifcantly improved, and the level of of bus stations, high-speed railway stations, airports, and autonomous driving is based on the tasks that the driver other infrastructures for the interaction of incoming and needs to perform. Te generation 2020–2022 is the period outgoing carriers has made tremendous progress. Ad- of high development of autonomous driving, which is the ditionally, there are new stations that refect the most reason for the increase of the percentage of driver. cutting-edge infrastructure technologies, such as smart On the other hand, the component with the greatest parking lots and charging stations. In addition, a driver change in the two adjacent generations also plays a key role also plays a signifcant part in intergeneration division, as in the intergeneration transition. Generations may change driving jobs are changing at this time due to the devel- with these components. Te result shows the top two opment of autonomous driving. As shown in Table 3 and greatest components in Table 4. Tis result is in line with the Figure 8, we note that the percentage of driver is gradually generation features mentioned previously. From 2010–2015 rising, which is related to the improvement of automation to 2016–2019, trafc managers have changed from manual to and driving comfort for driver operation in recent years. automatic, and passenger vehicles begin to connect to the Te current stage of vigorous development of vehicle- Internet. From 2016–2019 to 2020–2022, V2X enables line road collaboration and autonomous driving aims to devices and edge devices to be updated. Both auxiliary liberate drivers, whose role in the transportation system operating devices and roadside smart devices support generation change. has been changing. In addition to the driver, experience Number of Reports (policies) Number of Papers 8 Journal of Advanced Transportation 10 20 30 40 50 60 Epochs Figure 6: Te data in this fgure show the average total loss in every epoch. Te LDA2vec model can converge at 60 epochs. 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 –0.2 2010 2012 2014 2016 2018 2020 2022 Year Figure 7: Te clustering result shows that among 3-class, K-means can divide an accurate timeline. Table 2: Sorted by the weight of words, the top ten words of each Table 3: Sorted by the weight of words, the top three components cluster are shown. of each generation are shown. Clusters Top ten words Cluster Top three greatest components Vehicle, driving, technology, system, transportation, Transport vehicle, 2010–2015 2010–2015 development, control, automatic, construction, and station infrastructure, and driver planning Transport vehicle, 2016–2019 Technology, vehicle, trafc, control, system, station infrastructure, and driver 2016–2019 information, management, communication, city, and Transport vehicle, 2020–2022 data driver, and station infrastructure Development, vehicle, construction, driving, coordination, Te year division also shows a clear line, but in 2020–2022 transportation, autonomous, testing, information, 2018–2020, a generation is defned, which is difcult and device to explain in practical cognition. Terefore, it is not suitable to cluster into 4 classes, and it also proves the efectiveness of clustering into 3 classes. In addition, if we 4.3.3. Clustering Parameters. 3-class is the best clustering cluster more than 4 classes or less than 3 classes, a single point will become a class by itself, so it is also discarded result. In order to verify this, the clustering results of class 4 are shown in Figure 9. by us. Visualize parameter (y) Loss Journal of Advanced Transportation 9 0.1515 0.1905 0.2125 0.1242 0.5541 0.5772 0.5195 0.1313 0.1456 0.1471 0.1241 0.1224 2010-2015 2016-2019 2020-2022 Transport Vehicle Transport Vehicle Transport Vehicle Station infrastructure Station infrastructure Station infrastructure Driver Driver Driver Others Others Others Figure 8: Transport vehicle, station infrastructure, and drivers are all in the top three components. Te driver becomes the second component in 2020–2022. Table 4: Sorted by the weight of words, the top two components of 5. Conclusion connotation change between the two generations. In this work, a novel intergeneration division method based Generation change Top two greatest components on text big data is proposed, and it successfully fnds the time Trafc manager and From 2010–2015 to 2016–2019 nodes and key components of ATS intergeneration division passenger vehicle in massive text data. Our method searches the text records of Auxiliary operating device and massive ATS elements by keywords and uses LDA2vec to From 2016–2019 to 2020–2022 roadside smart device extract the text set topics of ATS subelements, which help to calculate the relationship between other elements and 1.25 components. Ten, the word distribution and the year distribution are fused through the element relationship to 1.00 obtain the word set of the components. Finally, we improve K-means, cluster it into the generation we need, and cal- 0.75 culate the key components of the generation. Te experi- 0.50 mental results show that both the time node of intergeneration division and the results of key components 0.25 are in line with realistic expectations. In the Discussion section, this study discusses the lim- 0.00 itations of this work. What our method learns is feature -0.25 relationship based on textual features, and there is a gap between them and reality. In addition, the generational line -0.50 should be fuzzy due to engineering project, and the clear year line we get should actually be a fuzzy border. 2010 2012 2014 2016 2018 2020 2022 In future work, we will consider the actual in- Year tergeneration relationship and complete the task of in- Figure 9: Te clustering result shows that among 4-class. tergeneration division. Based on the evolution model [13], we will try to learn the feature of future generations, divide 4.3.4. Disadvantage. Our method actually searches for the possible future generations, and give planning advice for connections between elements from textual big data. In fact, future transportation development. it is about learning from what has been documented before, rather than actually having a connection. Te real con- nection should be in industrial projects. Although this Data Availability method can learn the implicit relationship between ele- Te data used to support the fndings of this study can be ments, it is hard to know whether the implicit relationship obtained from the corresponding author upon request at exists in the real world. xiongch8@mail.sysu.edu.cn. On the one hand, generations are actually fuzzy because engineering applications take time, which is always several years. Te clear boundary obtained by our method should Conflicts of Interest actually be a fuzzy value. Te time nodes of the in- tergeneration division should be around 2015 and 2019. Te authors declare that they have no conficts of interest. Visualize parameter (y) 10 Journal of Advanced Transportation Conference of ITS China(ITSAC2021), pp. 141–148, Tianjin, Acknowledgments China, Augest 2021. [14] G. Xu, X. Liu, L. Zhong, Y. Xiao, and H. Huang, “Reliability Tis work was supported by the National Key Research analysis and evaluation of logical architecture for autonomous and Development Program of China under grant transportation system,” Journal of Railway Science and En- no. 2020YFB1600400, the National Natural Science gineering, vol. 19, no. 10, 2022. Foundation of China under grant no. 12002403, and the [15] Z. Deng, C. Xiong, and M. Cai, “Research on theoretical Shenzhen Postdoctoral Research Fund under grant no. model and construction method of the physical object for SZBH202101. autonomous transportation system,” in Proceedings of the 22nd COTA International Conference of Transportation, pp. 647–655, Reston, VA USA, November 2022. References [16] Z.-S. Zhou, M. Cai, C. Xiong, Z. Deng, and Y. Yu, “Con- [1] Z. Deng, C. Xiong, and M. Cai, “An autonomous trans- struction of autonomous transportation system architecture portation system architecture mapping relation generation based on system engineering methodology,” in Proceedings of method based on text analysis,” IEEE Transactions on Com- the 2022 IEEE 25th International Conference on Intelligent putational SocialSystems, vol. 9, no. 6, pp. 1768–1776, 2022. Transportation Systems (ITSC), pp. 3348–3353, Macau, China, [2] Y. Yu, C. Gou, C. Xiong, and M. Cai, “Research on generation November 2022. defnition method of autonomous transportation system [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent based on key trafc components,” in Proceedings of the 22nd dirichlet allocation,” Journal of Machine Learning Research, COTA International Conference of Transportation, pp. 550– vol. 3, no. 1, pp. 993–1022, 2003. 557, Changsha, China, July 2022. [18] J. Weng, E.-P. Lim, J. Jiang, and H. Qi, “Twitterrank: fnding [3] Ertrac Task Force, Automated Driving Roadmap, European topic-sensitive infuential twitterers,” in Proceedings of the road Transport Research Advisory Council, Brussel, Belgium, third ACM international conference on Web search and data 2015. mining, pp. 261–270, New York, NY, USA, February 2010. [4] State Administration for Market Regulation and Standardi- [19] J. Bian, Z. Jiang, and Q. Chen, “Research on multi-document zation Administration of the People’s Republic of China, summarization based on lda topic model,” in Proceedings of “Taxonomy of driving automation for vehicles,” 2021, https:// the 2014 Sixth International Conference on Intelligent Human- www.researchgate.net/fgure/Taxonomy-of-driving-automati Machine Systems and Cybernetics, vol. 2, pp. 113–116, on_tbl1_332944356. Hangzhou, China, Augest 2014. [5] X. Yang, Y. Zou, and L. Chen, “Operation analysis of freeway [20] Y. Luo and H. Shi, “Using lda2vec topic modeling to identify mixed trafc fow based on catch-up coordination platoon,” latent topics in aviation safety reports,” in Proceedings of the Accident Analysis & Prevention, vol. 175, 2022 https://www. 2019 IEEE/ACIS 18th International Conference on Computer sciencedirect.com/science/article/pii/S0001457522002159, and Information Science (ICIS), pp. 518–523, Beijing, China, Article ID 106780. June 2019. [6] Y. Zou, L. Ding, H. Zhang, T. Zhu, and L. Wu, “Vehicle [21] D. Wahhab Alkhafaji and S. Al-Rashid, “A topic modeling for clustering Arabic documents,” in Proceedings of the 2021 2nd acceleration prediction based on machine learning models and driving behavior analysis,” Applied Sciences, vol. 12, Information Technology To Enhance e-learning and Other no. 10, p. 5259, 2022, https://www.mdpi.com/2076-3417/12/ Application (IT-ELA), pp. 76–81, IEEE, Baghdad, Iraq, May 10/5259. 2021. [7] C. E. Moody, “Mixing dirichlet topic models and word em- [22] W. Feng, X. Nie, Y. Zhang, Z.-Q. Liu, and J. Dang, “Story co- beddings to make lda2vec,” 2016, https://arxiv.org/abs/1605. segmentation of Chinese broadcast news using weakly- 02019. supervised semantic similarity,” Neurocomputing, vol. 355, [8] T. Lusco, “Arc-it–the architecture reference for cooperative pp. 121–133, 2019. and intelligent transportation,” US Department of Trans- [23] H. Xia, W. An, J. Li, and Z. J. Zhang, “Outlier knowledge management for extreme public health events: understanding portation, Washington, DC, USA, 2018. [9] H. Y. Yuan, F. H. Liu, and K. Zhang, “Te key skill of de- public opinions about covid-19 based on microblog data,” velopment and research on intelligent transportation system Socio-Economic Planning Sciences, vol. 80, Article ID 100941, (its) architecture in China,” Technology and Economy in Areas 2022. of Communications, vol. 4, pp. 88–90, 2008. [24] P. P. Widagdo, T. D. Susanto, and H. J. Setyadi, “Te infuence [10] N. Gabriel, “Development and standardization of intelligent of user generation diferences on individual performance in transport systems,” International Journal on Marine Navi- using information technology,” in Journal of Physics: Con- gation and Safety of Sea Transportation, vol. 6, no. 3, ference Seriesvol. 1803, Bristol, England, IOP Publishing, pp. 403–411, 2012. Article ID 012031, 2021. [11] M. Mochizuki, A. Suzuki, and T. Tajima, “Utms system ar- [25] G. Cezard, ´ N. Finney, H. Kulu, and A. Marshall, “Ethnic chitecture,” in Proceedings of the 6th World Congress On diferences in self-assessed health in scotland: the role of Intelligent Transport Systems (Its), Held Toronto, Canada, socio-economic status and migrant generation,” Population, Washington, DC, USA, November 1999. Space and Place, vol. 28, no. 3, p. 2403, 2022. [12] W. Wei, L. Zheng, and M. Cai, “Research on user of intelligent [26] C. Li, “Intergeneration division and employment choices of transportation system for autonomous transportation,” migrant workers in the age-laborskill perspective,” China Technology and Economy in Areas of communications, vol. 2, Youth Study, vol. 7, no. 9, 2022. no. 24, 2022. [27] S. W. Park, K. Sun, S. Abbott et al., “Inferring the diferences [13] L. Zhang, Y. Xiao, and S. Jiang, “A study on the evolution in incubation-period and generation-interval distributions of model of autonomous transportation system based on the delta and omicron variants of sars-cov-2,” 2022, https:// complex network,” in Proceedings of the 16th Te Annual www.researchgate.net/publication/361825703_Inferring_the Journal of Advanced Transportation 11 _diferences_in_incubation-period_and_generation-interval _distributions_of_the_Delta_and_Omicron_variants_of_SA RS-CoV-2. [28] M. Coll Macia, ` L. Skov, B. M. Peter, and M. H. Schierup, “Diferent historical generation intervals in human pop- ulations inferred from neanderthal fragment lengths and mutation signatures,” Nature Communications, vol. 12, no. 1, p. 5317, 2021. [29] L. Gao, F. Du, J. Wang et al., “Examination of the diferences between sulforaphane and sulforaphene in colon cancer: a study based on next-generation sequencing,” Oncology Letters, vol. 22, no. 4, p. 690, 2021. [30] J. Guan, “Intelligent transportation system development evolution and its generation feature,” AI-View, vol. 4, pp. 40–49, 2022. [31] K. Krishna and M. Narasimha Murty, “Genetic kmeans al- gorithm,” IEEE Transactions on Systems, Man, and Cyber- netics, Part B (Cybernetics), vol. 29, no. 3, pp. 433–439, 1999. [32] C. McCormick, “Word2vec tutorial-the skipgram model,” 2016, http://mccormickml.com/2016/04/19/word2vec- tutorial-the-skip-gram-model/. [33] S. Stergiou, Z. Straznickas, R. Wu, and K. Tsioutsiouliklis, “Distributed negative sampling for word embeddings,” in Proceedings of the TirtyFirst AAAI Conference on Artifcial Intelligence, Palo Alto, CA USA, February 2017. [34] Q. Guan, S. Deng, and H. Wang, “Chinese stopwords for text clustering: a comparative study,” Data Analysis and Knowl- edge Discovery, vol. 1, no. 3, pp. 72–80, 2006. [35] H. Ying, X. Hai, and L. Wang, “Chinese synonym toolkit: Synonyms,” 2017, https://github.com/chatopera/Synonyms.

Journal

Journal of Advanced TransportationHindawi Publishing Corporation

Published: Mar 16, 2023

There are no references for this article.