Support Vector Machine and Granular Computing Based Time Series Volatility Prediction
Support Vector Machine and Granular Computing Based Time Series Volatility Prediction
Yang, Yuan;Ma, Xu
2022-04-16 00:00:00
Hindawi Journal of Robotics Volume 2022, Article ID 4163992, 12 pages https://doi.org/10.1155/2022/4163992 Research Article Support Vector Machine and Granular Computing Based Time Series Volatility Prediction Yuan Yang and Xu Ma School of Mathematics and Computer Science, Ningxia Normal University, Ningxia, Guyuan 756000, China Correspondence should be addressed to Yuan Yang; sjyangyuan@nxnu.edu.cn Received 17 December 2021; Revised 13 February 2022; Accepted 14 February 2022; Published 16 April 2022 Academic Editor: Shan Zhong Copyright © 2022 Yuan Yang and Xu Ma. ,is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the development of information technology, a large amount of time-series data is generated and stored in the field of economic management, and the potential and valuable knowledge and information in the data can be mined to support management and decision-making activities by using data mining algorithms. In this paper, three different time-series infor- mation granulation methods are proposed for time-series information granulation from both time axis and theoretical domain: time-series time-axis information granulation method based on fluctuation point and time-series time-axis information granulation method based on cloud model and fuzzy time-series prediction method based on theoretical domain information granulation. At the same time, the granulation idea of grain computing is introduced into time-series analysis, and the original high-dimensional time series is granulated into low-dimensional grain time series by information granulation of time series, and the constructed information grains can portray and reflect the structural characteristics of the original time-series data, to realize efficient dimensionality reduction and lay the foundation for the subsequent data mining work. Finally, the grains of the decision tree are analyzed, and different support vector machine classifiers corresponding to each grain are designed to construct a global multiclassification model. (i.e., granulation), thus contributing to better analysis and 1. Introduction problem-solving. ,e existing research on time-series in- With the rapid development of internet technology and the formation granulation is mainly divided into two aspects of improved performance of data storage devices in recent research in the time axis and theoretical domain, that is, years, a large amount of data is generated and stored in solving the problems of effective divisional representation of various industries. Among these data, a large portion of the time window and the theoretical domain. Support vector them are time-tagged, that is, a series of observations machines show many unique advantages in solving small recorded in chronological order, called time series. How to sample, nonlinear, and high-form recognition problems, effectively analyze and process this time series of data to such as using this technology to avoid local minimum of uncover potential and valuable knowledge and information knowledge and realize capacity control, and can be extended to support more efficient production, operation, manage- to other machine learning problems such as function fitting. ment, and decision-making activities of enterprises is one of Time-series time-axis information granulation research the important tasks in today’s big data era [1]. Computation usually uses a fixed time interval to divide the time series, is a new approach to simulating human problem-solving that is, hard division, and then represents the information thinking and solving complex tasks with big data and is an grains obtained after the division, ignoring the changing emerging research direction in artificial intelligence in re- characteristics of the time series on the time axis, which does cent years. ,e main idea of the theory is too abstract and not conform to the essential meaning of information grains, divides complex problems into several simpler problems so it is necessary to design the time-series information 2 Journal of Robotics granulation method according to the changing character- analysis are implemented in medical diagnosis problems. istics of the time series on the time axis so that the infor- Many results have also been achieved in terms of practical applications. ,e importance of attributes, as elaborated in mation grains obtained internal structures are similar to each other and the information grains are distinct from each the literature [6], was added to the granular computation of other [2]. Time-series domain information granulation knowledge while used in solving the minimal attribute studies usually cannot combine the requirements of both approximation, among others. In subsequent research, fuzzy interpretability and prediction accuracy of the domain quotient space theory was created by literature [7], improved partition interval, so there is a need to design a time-series by literature [8], perfected in the context of data mining, and domain information granulation method that can have both so on. He Y [9] dealt with word computation and language strong interpretability and high prediction accuracy [3]. dynamics and proposed a language dynamics system. ,e ,is paper introduces the granulation idea of granula- subsequent literature [10] elaborates a grain computation tion in time-series analysis, by granulating the time series model based on tolerance relations, giving a grain operation with information, granulating the original high-dimensional criterion for incomplete information systems, a grain rep- resentation, and a grain decomposition method. At the same time series into low-dimensional granulated time series, constructing information granules that can portray and time, in connection with the attribute simplification of rough reflect the structural characteristics of the original time- sets, the determination conditions are given, and the series data, thus realizing efficient dimensionality reduction problems such as the acquisition of attribute necessity rules and laying the foundation for subsequent data mining work, for incomplete information systems are addressed. Luo C studying the time-series information granulation problem- [11] applies the compatible granularity space model in the oriented to clustering and prediction, addressing the field of image segmentation. Kim S T [12] combines shortcomings in existing research methods, proposing three granularity with neural networks and is applied to efficient different time-series information granulation methods in knowledge discovery. Dong G [13] elaborates on the con- terms of both time axis and thesis domain, and applying nection between concept description and concept hierarchy them to stock time-series data for clustering and prediction transformation based on the similarity of the concept lattice and granularity partitioning in the process of concept analysis. Because of the shortcomings in existing research methods, the study of time-series information granulation clustering. Su W H [14] grain vector space and artificial for clustering and prediction proposes three different time- neural network, which improves the timeliness and com- series information granulation methods from both time axis prehensibility of knowledge representation of the artificial and theoretical domain and puts them into stock time-series neural network. Literature [15] decomposed copper and data for clustering and prediction analysis. To address the wheat prices based on EMD and EEMD methods, respec- problems of long training time and low efficiency of existing tively, based on multiscale perspective analysis, and finally, support vector machines for solving multiclassification BP neural network, SVM model, and ARIMA were used for problems, the idea of granular computing is introduced to prediction and integration, and the prediction results construct support vector machine multiclassification showed that the combined model prediction is better. Al- models, and the learning algorithm for improving the though the prediction model integrated by decomposition is construction of decision trees is investigated to achieve the better, there are some defects, such as the wavelet decom- purpose of improving its training efficiency and classifica- position method has problems of weak adaptability of itself tion accuracy. and poor robustness of network training in the process of data decomposition, while the EMD method has problems of modal overlap and lack of theoreticality in the decompo- 2. Related Work sition process. Moreover, for price series with multiscale and Combining other more mature theories and methods with high noise, the number of each component after decom- SVM has become a research topic with great potential for position of these methods is high, which is not conducive to development. However, at the same time, it faces problems the subsequent forecasting work. After that, the literature [16] constructed a new sequence decomposition method, the such as difficult classification and inaccurate prediction. ,e current research on granularity support vector machines empirical wavelet transform method, based on wavelet transform and combined the advantages of EMD. ,e lit- mainly focuses on the combination of specific models: SVM with rough sets, decision trees, clustering, quotient spaces, erature [17] and others used EWD and EWT to decompose the wind power sequences and then combined the with the association rules, and so on. ,ese results only preprocess the data, but these models are important for the theoretical neural network method for cross-combination prediction, and after comparison, it was found that the sequences study of machine learning and support vector machines, as well as for the exploration of problems such as intelligent decomposed by EWT had a better prediction effect. ,e information processing. basic idea of a rough set is to form concepts and rules Egrioglu E [4] studies rough lower approximation and through analytical induction and study target equivalence rough upper approximation on the space of grain approx- relations as well as categorical approximation knowledge discovery. Zhao Y [18] combines multilevel and multi- imations from the perspective of rough set theory. Subse- quently, the concept of grain-logic (G-logic) is given in the perspective granularity methods by defining the division sequence product space and using nested division sequences study of literature [5], where a similar inference system is built based on rough logic, while instance verification and to define different granular layers over the theoretical Journal of Robotics 3 domain. Finally, a granulation model based on the divi- the algorithm flow of the support vector machine-based sion order is given using the division order product space. granular computing. Solving for an estimate of a system sample based on a Chen W [19] proposes a neighborhood granulation method by introducing inter- and intraclass thresholds to known training sample, in a dependency between the construct a supervised neighborhood-based rough set output inputs of a system, machine learning makes a model and gives the rough approximation quality and relatively accurate estimate of the unknown data as conditional entropy monotonicity change theorems for possible. ,en it is possible to model the problem of this model by analyzing the neighborhood particle change machine learning as the existence of some unknown law under double thresholds. After studying the operation dependency between input variables and output variables mechanism of data information particles in literature [22]. ,e basic idea of the support vector machine is to get [20], the nonstandard analysis is used as the operation rule a high-dimensional space, use nonlinearity to transform the space of the input, and then solve the optimal linear of information particles; the accompanying binary rela- tion is proposed; and the coarse and fine particle layer classification and finally define the appropriate inner product function to complete this nonlinear transfor- division of information particles in the binary relation is analyzed in-depth, and the algorithm can realize the mation. ,e triadic theory of granular computing includes merging and decomposition of particle layer space, which multiperspective, multilevel granular structures, and can effectively reduce the data calculation intensity and granular computing triangles. ,e methodology of simplify the data analysis process. granular computing is a structured problem-solving, and the computational model of granular computing is structured information processing. ,e triad emphasizes 3. Support Vector Machine Based the mutual support of the philosophical, methodological, Algorithms for Particle Computation and computational models of that computation. ,e study of granular computing attempts to organize, abstract, and Set theory is the foundation of modern mathematics, and fuzzy set theory is one of the new mathematical tools and combine granular processing ideas from various disci- plines to obtain a higher-level, systematic, and discipline- theories. Once the concept of fuzzy sets and the problem of granularity of fuzzy information were introduced, it specific knowledge-independent principle of granular computing. rapidly expanded the scope of its use and extended the theory of fuzzy logic, followed by the “theory of word ,e traditional algorithm steps are as follows: computation”, which aims to use language for fuzzy (1) Select the number of grains to be divided p. computation and reasoning to achieve fuzzy intelligent control. At the same time, the integration of fuzzy set λ λ θ θ 1 2 f (p, x) � d x − + d x − . theory and quotient space, using fuzzy equivalence re- (1) ϕ p p d (x) d (x) p p lations, completed the study of the expansion of the quotient space model and grain computation and was able f (p, x) is the overall feature function of the data to accurately map and solve uncertainty problems. set. ,erefore, a proper hierarchical progressive granularity (2) Determine its objective optimization function. structure can solve the problem effectively. However, this theory does not have the means and technical algorithms l+1 zJ(w, p) ∗ l ∇ t J w, p; x , y � δ + λα . to complete the transformation, including between b (2) i ij zb ij granularity and granularity world, between granularity and granularity, and between granularity world and In objective optimization, the function w is the granularity world. If this problem can be solved, it will penalty parameter, and the classified samples will improve and promote the theory and the scope of use of appear as nonseparable regions and may also appear the quotient space [21]. to belong to one class of samples or multiple classes ,e fusion of the three models in turn produces fuzzy of samples. rough sets, fuzzy quotient spaces, and so on so that the three models are both distinct and related. First, between rough (3) Generate k decision functions as follows: sets and fuzzy sets, the former are processed later, and the latter are preexisting. However, both describe and generalize 2 X � T + b X . (3) k k ij the incompleteness and inaccuracy of information grains, i�1,j�1 and there are significant differences in the processing of information grains. Rough sets focus on the coarseness of (4) ,e radial basis kernel function is then obtained as information grains, describe grains by upper and lower follows: approximation operators, and emphasize indistinguish- zc 1 ability and the classification of different equivalence classes. G(J) � j + X Y . (4) i i zj n Fuzzy sets focus on fuzziness, describe and emphasize the i�1 indistinguishability of boundaries using affiliation and af- (5) ,e final if-then form and the main fuzzy constraint filiation functions, and study only the degree of affiliation of the same equivalence class. Figure 1 shows the framework of propagation rules are proposed. 4 Journal of Robotics SVM- ARIMA User Profile Possessor Modification Assessable data constraints data Store Data Validation AP1 AP2 AP3 Retrieve Data Attribute Cloud storage information preprocessing Selection User l repository User 2 ...... User IoT UDR Realization Diagnosis ...... Measurements Generation Algorithms methods User n Figure 1: Support vector machine-based algorithm flow framework for granular computing. structure is a description of the relationships and connec- g(i, j) + g max T(i, j) � + |X − Y| . (5) tions between grains and grains, grains and layers, and layers g max − g min and layers. ,e grain structure is a relational structure consisting of interconnections between grain layers. ,ere It follows that cluster analysis can be considered as a are three modes: top-down, bottom-up, and center-out, concrete implementation of the idea of granulation, which is which are three common modes of information processing another layer of abstraction on the idea of cluster analysis. for humans and computer information processing. Granular computing is a concrete implementation of the ideas of granularity and hierarchy in the solution of machine problems. ,e core concept of granular computing is 4. Support Vector Machine and Granular multilevel and multiview granular structure [23]. ,e fun- Computing Based Time Series damental framework of granular computing consists of Volatility Prediction particles, granular layers, and granular structures. ,e most fundamental element of granular computation is called a 4.1. Empirical Modal Decomposition of Time Series Fluctua- particle, which is composed of a collection of individuals tion Algorithms. Empirical modal decomposition (EMD) is described by internal properties and a whole of external a signal decomposition processing method, the principle of properties. An abstracted description of the space of which is to decompose the originally complex original price problems or samples of operations is called a granule layer, signal sequence into a finite number of simpler eigenmode and the whole of particles obtained based on some required functions (IMFs). Each IMF represents the information granulation criterion constitutes a granule layer of internal contained in the original price series at different scale levels, particles with some identical (or similar) granularity or which can effectively reflect the embedded characteristics of properties. Granularity comes from the way people perceive the original price series at a low level. EMD decomposition the world. ,e observation, definition, and transformation method is a kind of data processing method that can smooth of practical problems are granular calculations for different the complex signal series, which decomposes the original problems measured from different perspectives or levels, and signal series into several component series according to the in different applications, granularity can be interpreted as frequency level; the first component series has the highest size, abstraction, complexity, and so on. Different grains can frequency, then decreasing in order; and the last sequence be ordered by their granularity. Each grain provides a local, has the lowest frequency so that many component sequences partial description, and all grains in that layer combine to at different feature scale levels can be obtained, and the provide a global, complete description. Grain calculations decomposed component sequence corresponds to an IMF are often solved on different grain layers. A multilevel grain eigenmode component of the relevant frequency. ,e Journal of Robotics 5 superimposed waves in the original signal can be removed, frequency fluctuations of the general sequence can indicate the degree of price series fluctuations; the higher the fre- and symmetric modal waveforms can be obtained. In the EMD algorithm, the sequence IMF1 contains the compo- quency of the component corresponding to the price series fluctuations more violent, the lower the frequency of the nent of the original sequence with the smallest period. ,e component corresponding to the price series fluctuations residual term after subtracting IMF1 from the original se- more moderate, so the last decomposition of the component quence contains the part of the vibration signal whose period out of the fluctuations of the most moderate general can is larger than IMF1, so the average period of IMF2 is represent the price series fluctuations towards, generally generally larger than the average period of IMF1. By analogy, referred to as the residual component [24]. the IMF sequence filtered by the EMD algorithm has to decrease the signal frequency, decrease fluctuation intensity, EMD decomposition is simple, and the decomposition results are more accurate because the process of mathe- and increase the average period, and the final residual term is a constant or monotonic function, which reflects the long- matical processing is adaptive without human interference and is automatically generated decomposition results. EMD term trend in the sequence. ,e problem of unbalanced data classification has also become an important direction in the wil automatically generate different basis functions and the most appropriate number of components according to the field of data mining and machine learning to study the fluctuation of different price series. In contrast, the wavelet classification problem. A better solution to the classification decomposition method requires the selection of the basis problem of unbalanced data distribution is required to deal function in advance when processing the original sequence with the data classification problem in a more compre- and then requires several training trials to determine the hensive way. Usually, the total number of components most appropriate number of components. Time-series in- obtained by EMD decomposition is log N, where N is the formation granulation based on a support vector machine is number of data samples in the original series. However, to introduce the idea of granular computing support vector since the actual time series are composed of both real signals machine into time-series analysis, which is a new research and noise, the empirical modal decomposition is processed direction of time-series analysis [25]. ,e idea of the concept for data containing noise, and for some time-series data with of information granulation is to decompose a whole into signal jump changes, the jump signals may cause scale loss, small parts and then study the decomposed parts, and each making the decomposed results have the problem of modal part is divided into a particle. In other words, information confounding. granules are elements that are similar and indistinguishable When there is a jump in the scale of the original time- or have a certain function. Information granulation of time series signal, the decomposition result of EMD may have the series is the basis for compressing the scale of time-series problem of data modal mixing. ,e so-called modal overlap data and using it for subsequent time-series analysis, in- is manifested in the decomposition results because there terpretation, and modeling [26]. ,erefore, compared with should be only one scale feature in the scale feature, and the the wavelet decomposition method, the EMD decomposi- subsequence of the scale feature is not unique, and the tion method highlights obvious advantages in the operation signals of multiple scale features are mixed in the sequence. of the decomposition process. However, there are some Particularly, influenced by the signal in many aspects such as drawbacks in the application process of EMD, such as the collection frequency, frequency components, and signal easy occurrence of component stacking situations and amplitude, the phenomenon of modal blending can easily endpoint contamination situations. ,is is because EMD occur when empirical modal decomposition is performed needs to construct the upper and lower envelopes of the directly, and modal blending mainly refers to the following sequence using the cubic spline method in the decompo- two aspects: sition process, but the cubic spline method will disperse near (1) A single IMF contains components of the full het- the boundary points of the original sequence, and with the erogeneous scale decomposition of EMD, the endpoint effect will gradually (2) Signal components of the same scale appear in spread inward to pollute the whole sequence, resulting in different IMFs interference with the final decomposition effect. ,e sim- plest way to cope with this problem is to keep discarding the To obtain a relatively stable speed of the vehicles within nonextreme part at the endpoints during the decomposition, the cluster, we use the average speed of the vehicles within but this will cause data waste and thus affect the latter the cluster to characterize the stability of the cluster and filter prediction effect. If the data at the boundary points of the the vehicle nodes within the above set of pairs of neighboring sequence is not deleted, generally, it is only possible to add nodes by motion consistency to remove the vehicle nodes data to each end by various methods, and this extension within the cluster with a large difference from the average process will be disturbed by human factors, which will speed, to ensure that the cluster can travel on the road in a eventually affect the decomposition effect. Figure 2 shows relatively stable manner. Specifically, the average speed of the process of variational modal decomposition. the vehicles within the cluster at time t can be expressed as ,e empirical modal decomposition algorithm can be follows: understood as a set of adaptive filters that sieve the data layer δx n! by layer according to its essential scale characteristics in a c v � x + μ , (6) process where the characteristic time scales are separated in δt r!(n − r)! order from small to large. After such decomposition, the 6 Journal of Robotics IMF Noise-only Remove Scale IMF ... SE Original Decomposition Mix Scale WFLP filter Reconstructed Reconstruction ... Signal Signal IMF Dri Scale IMF Predict dri CS-SVR model Figure 2: Variational modal decomposition flow. where N (t) denotes the number of elements in the set of classification algorithms, that is, based on the flaws and neighboring nodes of V at time t and V represents the n-th deficiencies found in previous algorithms for solving im- i it element within the set N of neighboring nodes of at time balance problems, the algorithms are appropriately im- vi Vi t. If the velocity of V satisfies the following equation, it will proved and extended to improve the ability to handle jn be removed: imbalance classification problems. ,e use of a pseudosignal has also been proposed to solve the modal aliasing problem by introducing a pseudosignal to avoid the inclusion of too N(t) � N · V + A. (7) vi in i�1 wide a band in the IMF, but this method is also an approach that requires human subjective judgment to intervene and ,e set of neighbor nodes of vehicle V at moment t can suffers from the same problem of weakened adaptivity. be expressed as follows: N �