Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics

Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics B. PRADHAN, National Institute of Technology Jamshedpur, India GAUTAM SRIVASTAVA, Brandon University, Canada D. S. ROY, National Institute of Technology Meghalaya, India K. H. K. REDDY, National Institute of Science and Technology, India JERRY CHUN-WEI LIN, Western Norway University of Applied Sciences, Norway Software-Defined Networks (SDNs), with their segregated data and control planes, has proved to be capa- ble of managing massive amounts of data by leveraging distributed information available across the network for informed decision-making at the network controller. However, with the proliferation of next-generation, real-time Internet of Things (IoT) applications that vary greatly in terms of data frequency and volumes, data traffic classification can substantially assist SDN controllers toward efficient routing and traffic engineer- ing decisions. Existing works on network classification are limited by their application-centric nature, thus overlooking the key criterion for real-time IoT applications, namely, Quality of Service (QoS). In this article, we focus on augmenting SDN controllers’ decision-making capacity and Underwater Sensor Networks with machine learning algorithms to achieve real-time, QoS-aware, network traffic classification. Three classifiers, namely, Feed-forward Neural Network, Naïve Bayes, and Logistics Regression have been employed with a novel Artificial Neural Network and Particle Swarm Optimization hybridization scheme by carrying first- and second-order stability analysis for performance improvement of these classifiers. In short, the proposed framework exploits optimization algorithms and semi-supervised machine learning (ML) for precise traf- fic classification while keeping communication overhead between controller and switches minimal. Results obtained from real-life datasets demonstrate the efficacy of our proposed scheme. CCS Concepts: • Information systems → Data cleaning;• Theory of computation → Evolutionary algorithms;• Security and privacy→ Data anonymization and sanitization; Additional Key Words and Phrases: Software defined network, network traffic classification, Feed-Forward Neural Network (FFNN), Naïve Bayes, logistics regression, Particle Swarm Optimization (PSO) Authors’ addresses: B. Pradhan, Department of Computer Science and Engineering, National Institute of Technology Jamshedpur, Jamshedpur, Jharkhand, 831014, India; email: buddhadebpradhan@gmail.com; G. Srivastava (corresponding author), Department of Math and Computer Science, Brandon University, Brandon, Manitoba, CANADA, R7A 6A9 and Research Centre for Interneural Computing, China Medical University, Taichung 40402, Taiwan; email: srivastavag@ brandonu.ca; D. S. Roy, Department of Computer Science and Engineering, National Institute of Technology Megha- laya, Shillong, Meghalaya, 793003, India; email: diptendu.sr@nitm.ac.in; K. H. K. Reddy, Department of Computer Science and Engineering, GITAM University, Visakhapatnam, Andhra Pradesh 530045, India; email: khemant.reddy@gmail.com; J. C.-W. Lin, Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway Uni- versity of Applied Sciences, Inndalsveien 28, 5063 Bergen, Norway; email: jerrylin@ieee.org. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1550-4859/2022/04-ART34 $15.00 https://doi.org/10.1145/3474556 ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:2 B. Pradhan et al. ACM Reference format: B. Pradhan, Gautam Srivastava, D. S. Roy, K. H. K. Reddy, and Jerry Chun-Wei Lin. 2022. Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics. ACM Trans. Sen. Netw. 18, 3, Article 34 (April 2022), 15 pages. https://doi.org/10.1145/3474556 1 INTRODUCTION The rapid growth of a large number of applications has led to a tremendous rise in data. Such data generated has stringent networking requirements. Traditional network devices have both data and control planes strongly coupled together with proprietary protocols and closed interfaces, thus making handling issues such as policy enforcement and user-aware routing that can vary in complexity [17]. Software-defined networks (SDNs) are a recent network paradigm that sets apart data and control planes [22]. This separation of planes and centralization of controllers offers a great deal of flexibility and innovation in the network for policy enforcement based on network requirements, thus removing vendor lock-in [6]. The communication between the control plane and data plane is governed by a southbound Application Programming Interface (API) known as OpenFlow [8]. The OpenFlow protocol in SDN is leveraged by the data plane to dispatch the network statistics to the control plane. The control plane then formulates policies for every flow in the network and thus imparts logic to the data plane which is depicted in Figure 1. SDN has paved the path for easy handling of big data flows in networks, be it data streams from Internet of Things (IoT) devices to Cloud datacenters or intra-datacenter network traffic. This has been made possible by leveraging distributed information amassed across the network available to the SDN controller for informed decision-making. However, with the proliferation of next-generation, real-time IoT applications that vary greatly in terms of data frequency and data stream volumes, data traffic classification can substantially assist SDN controllers toward efficient routing and traffic engineering decisions. Existing works on network classification are limited by their application-centric nature, thus overlooking the key criterion for real-time IoT applications, namely, Quality of Service (QoS). Machine learning (ML) helps to make effective decisions from the prediction of real-time and historical data [8]. Network statistics amassed at every switch of an SDN network (that collectively makes up the data plane of SDN) can be easily monitored and leveraged gainfully by the controller for implementing intelligent decisions to be implemented at the forwarding plane. To cater to the demands of a large number of applications and to effectively handle conflict- ing resource requests, the need to design application-aware networks could be felt. For instance, underwater wireless sensor networks comprise nodes that are deployable on the surface and un- der the water. All nodes need to communicate and exchange information with other nodes in the same network and with the base station. Communication systems in the sensor network involve the transmission of data using acoustic, electromagnetic, or optical wave media. Among these types of media, acoustic communication is the most popular and widely used method due to its at- tenuation features in the water. The factor of low transmission is derived from the absorption and conversion of energy into heat in the water. Meanwhile, acoustic signals operate at low frequencies, which enables them to be transmitted and received over long distances. The key requirement for this kind of application-aware networking is network traffic classification; though it is not so easy to implement traffic classification when the sensor networks are underwater. However, the traffic classification at the controller helps to make informed decisions about the applications’ network requirements. Such traffic classification would pave the path for segregation of large and small ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:3 Fig. 1. A boad overview of SDN planes. flows that affect resource requirement fulfillment and thus datacenter performance considerably [21]. This separation of large and small flows is necessary as the large flows consume bandwidth considerably, thus overcoming performance deterioration of small flows which are typically delay intolerant. Further, for QoS-aware-based applications to meet the resource allocation requirements, it is required network traffic classification and fulfilling such network requirements becomes de- sirable for seamless functioning of the network [21]. The centralized view of the whole network, traffic classification at the controller in SDN helps to formulate application-specific rules which are critical for the network to work efficiently and in a seamless manner. However, accurate traf- fic classifications are still a research problem. In this article, an attempt has been made to the software-defined traffic classification problem from a new perspective by employing evolutionary- based ML algorithms jointly for improved network traffic classification. The main contributions of this article include the incorporation of evolutionary algorithms-based classifiers for network traffic classification and selecting proper network traffic databases obtained from real-life Inter- net data. Three classification algorithms, namely, Feed-Forward Neural Network (FFNN), Naïve Bayes, and Logistics Regression (LR), have been employed on a hybrid Neural Network using Particle Swarm Optimization (NN-PSO) for fine-tuning the performance; particularly readily available datasets are not there for this purpose. To the best of our knowledge, such SDN traffic classification attempted in this study and performance improvement with hybrid approaches have still been unexplored in literature. The remainder of this article is organized as follows: Section 2 presents a brief background to this research and discusses the state of the art. Section 3 describes the underwater sensor networks related to ML techniques. Section 4 discusses the problem formulation and provides the solution methodology. Implementation details and results are discussed in Section 5. Lastly, Section 6 con- cludes and provides further avenues for this research. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:4 B. Pradhan et al. 2 RELATED WORKS Underwater Wireless Sensor Networks (UWSNs) are developing quickly and receiving signif- icant attention, becoming the main focus of both researchers and practitioners [23]. With high technological advances in UWSN, sensors have become smarter, smaller, and more flexible with lower power consumption, increased processing capacity, and the ability to operate in various underwater applications. Also, UWSN technology can be integrated with Internet Protocol-based systems in supporting the IoT and machine-to-machine (M2M) frameworks for real-time mon- itoring. The rapid growth of the UWSN domain and the availability of modern sensor node tech- nologies have forced the necessity to ensure that awareness is increasing every year due to their compatibility and broad application in various sectors. There have been several attempts to classify network traffic into a set of varied categories. These categories include QoS-aware, flow-aware, and application-aware traffic classification. Most of the research has focused on application-aware traffic classification. Very few have focused on classifica- tion based on QoS and flow awareness. QoS classification helps to detect the classes of a multitude of flows. Wang et al. [ 19] classified traffic into various classes based on the QoS. The proposed work utilizes Deep Packet Inspection (DPI) and semi-supervised learning for the classification of traffic. Former DPI is utilized for labeling a part of predefined applications. Flow classification of new applications is done using trained models from apriori known datasets using the Lapla- cian Support Vector Machine (SVM). This methodology is used for categorizing known and unknown applications into varied sets of QoS classes. The obtained results demonstrated for the proposed system have an accuracy of over 90%. Flow-aware classification aims to segregate the network traffic into a set of mice and elephant flows. Elephant flows transfer huge data into the network while the latter are cursory and are usually delay tolerant. Glick et al. [5] focused on scheduling flows in a hybrid data center. For making elephant flow-aware traffic classification ML techniques are employed at the edge of the network. This classification is used by the SDN controller to implement an efficient traffic flow optimization algorithm. Xiao et al. [20] used a two-way cost-effective strategy for the identifica- tion of elephant flows. Firstly, the head packet was used for the identification of elephant flows. Secondly, a decision tree is employed to analyze whether the categorized flows are an elephant or not. Amaral et al. [2] employed an OpenFlow-based SDN system is deployed in an enterprise network to allow the collection of traffic data. After the collection of data, several classifier algo- rithms are used for the classification of traffic flows into varied applications. Li and Li [ 9]used a MultiClassifier to classify applications by using a combination of ML and DPI-based classifiers. The First ML-based classifier is used for every new flow arrival. The application is deemed to be of MultiClassifier only if the reliability of the ML-based classifier is larger than a threshold value. Otherwise, an accurate classifier like DPI is used. If DPI does not return “unknown,” its result will be selected. Rossi and Valenti [18] classify applications based on running User Datagram Pro- tocol (UDP). In this article, we present classification of a behavioral classification engine that is application aware. Depending upon the count of received packets and bytes, UDP-based traffic is classified with the help of the SVM algorithm. The SVM-based classification has an accuracy of over 90%. Qazi et al. [16] propose a framework called Atlas. The proposed method was used to classify mobile applications. For the allocation of ground truth data, a crowdsourcing approach is used. The collected data from the end devices are used for training the decision tree. This training model helps in the identification of traffic flows belonging to mobile applications. The accuracy of the top 40 Google Play applications is over 94%. Nakao and Du [13] identified mobile applications using deep Neural Networks (NNs). The data collected belongs to an experimental network. In the eight-layer deep NN model five flow features are selected (Packet size, TTL, Destination Port, ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:5 Destination Address, and Protocol type). The results demonstrate the accuracy of 93.5% for about 200 mobile applications. 3 ML-BASED DATA ORGANIZATION FOR UWSN UWSNs consist of sensor nodes and vehicles deployed underwater used to monitor underwater conditions. These underwater conditions can be temperature and pressure. It is also known as an underwater Acoustic network or underwater communication network. Nowadays, UWSN is chal- lenging because of limited battery power and bandwidth and the requirement of dense deployment of the sensor. Some of the important applications of UWSN are Oceanography data collection, Pol- lution, and environment monitoring, Disaster prevention, scientific exploration in the underwater environment, and so on. These applications depend on collected and transmitted data in UWSN and they predict disasters like floods, hurricanes, earthquakes, tsunamis, tornadoes, and volcanic eruptions. Data agglomeration is a process that can be used to solve the problem related to the collection and storage of data. This process may serve as a supplementary process to the routing process. UWSN poses various functional challenges that have been addressed so far by the us- age of ML techniques. Functional challenges consist of Clustering and Data Agglomeration, Event Detection, Query Processing Routing in UWSN, and Localization and Object Tracking. Data Agglomeration is an iterative classification method. In this method, firstly all the data points are a cluster of their own, then take two nearest and join them to form one single cluster, and lastly processed recursively until it obtains the desired number of clusters. It is a bottom-up technique and it works from the differences between the objects to be grouped. The data agglomer- ation method can be used by principle Component analysis and self-organization map technique. K-means algorithm is a popular ML clustering algorithm for collaborative data processing in Clus- tering and Data Agglomeration. The network property of utmost concern is that of clustering so Large Scale Network Clustering can be used by Neural networks. Event Detection and Query Processing are most important for the functional challenges of UWSN. There are lots of Event Detection and Query Processing methods like event Recognition, Forest Fire Detection, Query Processing, Distributed Event Detection, and Query Optimization methods. Using a Bayesian algorithm for event Recognition, using K-Nearest Neighbour for Query Processing, using Neural Network for Forest Fire Detection, and using Principal Component Anal- ysis for Query Optimization methods [11]. The design task of routing protocols for UWSN is quite challenging because of multiple char- acteristics which differentiate them from wireless infrastructure-less networks. Some design chal- lenges are observed in UWSN due to bandwidth, energy, and processing storage, so some essential features are most important for UWSN, such as energy efficiency, data transmission models, and sensor location. Using Self Organized Map and Reinforcement Learning for Data Routing and Rout- ing Enhancement in UWSN. Object Detection is that your algorithm may find multiple detections of the same objects. In UWSN, we first localize the object using SVM and a Decision tree. One of the popular applications of CNN is Object Detection/Localization in UWSN. UWSN poses various Non-functional challenges also which have been addressed so far by the usage of ML techniques. They are Security and Anomaly Intrusion detection, QoS, Data Integrity and Fault Detection, and Varied Applications. Anomaly-based network intrusion detection performs in protecting networks against malicious activities. Outliers are extreme values that deviate from other observations on data and three algo- rithms can detect the outlier. Those are Bayesian Belief Network, K-Nearest Neighbor, and SVM. In UWSN, random occurrences of faulty nodes degrade the QoS of the network. In this article, we propose an efficient fault detection scheme to manage a large-size UWSN. Using Neural Network, ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:6 B. Pradhan et al. It estimates a set of technologies that work on a network to guarantee its ability to dependably run high-priority applications. We can find the Accuracy and Reliability Prediction of the Sensor Network. Nowadays, Air Quality Observing and Intelligent Lighting Control is a popular non- functional challenge in UWSN using Neural Networks [12]. 3.1 Illustrative Example Consider a cluster of four machines in which nodes 1–3 are the WorkerNodes and node 4 is the MasterNode, which is implicit. There are some tasks in the system which have a different syntax of task execution like tasks scheduled at nodes 2 and 3 after time 3t. They represent the tasks that are yet to be completed and the representation t++t represents that the task already has executed for time t and the++ t represents that it is the estimated time left for the completion of this task. Let us suppose that normal tasks take a time of t and 2t. There are a total of nine map tasks in the job. At 4t in the diagram, all the tasks are scheduled; at this instant, we do profiling of nodes by the number of map tasks completed in the job. Thus, node 1 performs the majority of computation of four tasks, node 2 has completed two tasks with another task currently being processed on it, and node 3 has completed one task with another task currently processing on it. Since after 4t,node 1 is free, it will notify the Jobtracker that it is free via heartbeat. Since all the tasks are already scheduled at 4t, then node 1 can become a better candidate for scheduling speculated tasks. Now we check the remaining time of the tasks which are yet to be completed, i.e., at nodes 2 and 3. Based on the processed data and the data left unprocessed in this task, let the remaining time to complete the task at nodes 2 and 3 be t and 3t, respectively. The task at node 3 becomes suitable to be executed speculatively as it has the largest remaining time to be completed as its backing up time forthistaskis2t. Since the backup time of 2t is less than the estimated remaining time of 3t, this node is speculated at node 1 and hence it completes within a time of 2t. After the task is completed at node 1, it will let the Jobtracker know about its completion and the task still running at node 3 will be killed automatically. Hence in this scenario, with the help of speculation, there is a savings of time t in the completion time of the job. Thus, the performance will automatically improve with this controlled speculation. 4 NETWORK TRAFFIC CLASSIFICATION USING HYBRID EVOLUTIONARY ALGORITHM In this section, the network traffic classification problem and its proposed solution have been dis- cussed. Starting from data collection to obtaining optimized network traffic, all the used method- ologies have been briefly described. 4.1 Dataset Preparation The main problem encountered in the classification of Internet traffic classification using ML is the requirement of dataset validation and training. Normally, dataset characteristics are supposed to be similar to the real network environment. In this section, it has been analyzed that each of the issues highlights the factors that can affect the relationship between ML training and testing datasets. It is important to note that all these issues have a significant effect on online classification but are relatively less effective for offline classification. 4.1.1 Training and Testing ML Dataset Collected from the Same/Different Networks Environment. The networks include different configurations such as using real IP, using NATed IP [ 7], and so on. The pertinent question arises here: are the statistical features of Internet applications the same in different network scenarios? In other words, what is the effect while collecting training and testing datasets from the same/different network? Many sub-questions can arise from this main question, such as the following: ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:7 — If the training and testing dataset is collected from networks having different network re- quirements, what is the effect? — If the training and testing dataset is collected from networks having the same network re- quirements, what is the benefit? — If the network characteristics are changed, what traffic features will be affected by it? — During the training and testing of datasets in ML research papers, what information needs to be added? For the same class in ML Internet traffic classification, the training dataset is assumed to repre- sent the testing dataset. Different network scenarios can generate different traffic patterns. This variation in the traffic classification means the values of the traffic features (like packet length and their interior arrival time) can be different when the network segments are different. According to Nguyen and Armitage [14] many statistical properties of Internet applications are varied over some time. Alshammari and Zincir-Heywood [1] provided a better comparison between classifica- tion accuracies of Skype traffic. The datasets employed for training and tested were obtained for varied networks over multiple years. 4.2 Implementation of ML Algorithms The big challenge is to employ proper ML algorithms fitting to the problem’s suitability. Lots of ML algorithms are available but the best-fit algorithm selection to a specific SDN application is an intelligent demonstrating problem. Thus, two ML algorithms have been purposefully selected. One is Logistic Regression and another one is the Naïve Bayes algorithm. In this article, the selected problem was closely judged with other classifiers and it found these two are the best fit for the said problem only. Figure 2 delves the step-by-step working procedure of how the ML works. 4.2.1 Logistic Regression. Logistic regression categorizes the data based on binary responses in Data Modeling. It can predict the probability values which are restricted between the 0 and 1 interval. These probabilities are well decorated in comparison to other classifiers like Naïve Bayes, which can also predict probabilities. In the case of a categorized dependent variable, logistic re- gression is used. For instance, it is required to classify a tumor whether it is malignant (1) or not (0). In this scenario, one needs to set up a threshold value where classification is required to be done. Suppose, if the actual class is malignant, with a predicted continuous value of 0.5 and the threshold value is 0.6, the data point will be classified as not malignant, which ultimately can lead to serious ramifications in real time. 4.2.2 Naïve Bayes. The Naïve Bayes method follows the Bayes’ theorem and it assumes that predictions are independent; in that presence of a specific class feature is supposed to be non- correlated to that of any other feature. The Bayes theorem allows calculation of posterior proba- bility Pp (x|y ) from Pp (x ), P (y ),and Pp (y|x ). The mathematical expression is as follows: Pp (y|x )Pp (x ) Pp (x|y ) = , (1) P (y ) where Pp (x|y ) is the posterior probability of class (target) given predictor (attribute), Pp (x ) is the prior probability of class, Pp (y|x ) is the likelihood which is the probability of the predictor given class, and Pp (y ) is the prior probability of the predictor, respectively. 4.3 Brief Overview ANN ANN belongs to one of the most important and useful data-driven computational techniques which have been deployed in various network traffic problems. The input and output are related in a nonlinear way. The inputs for this said experiment include a port number, topology-related ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:8 B. Pradhan et al. Fig. 2. Flowchart of the proposed ML-based procedure. information, and traffic. The output is to find the shortest traffic for this said neural network. The details of ANN formulation have been extracted from the literature [15]. 4.4 Basics of PSO PSO is a stochastic population-based meta-heuristic algorithm first introduced by Kennedy and Eberhart in 1995 [4]. Suppose that the size of the swarm is N (population size) and the search space is D dimensional. th The position of the i particle is presented as x = (x , x ,..., x ) where x ∈ [lb ,ub ], id i1 i2 iD id d d th d ∈ [1, D], and lb andub arethe lowerand upperboundsofthe d dimension of the search space. d d th The i particle velocity is presented as v = (v ,v ,...,v ). At each timestep, the position and i i1 i2 iD velocity of the particles are updated according to the following equations: lB v (t + 1) = ω × v (t ) + c × r × p (t ) − x (t ) ij ij 1 1 ij ij дB +c × r × p (t ) − x (t ) , (2) 2 2 ij x (t + 1) = x (t ) + v (t + 1), (3) ij ij ij ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:9 where — ω: inertia weight which balances the exploration and exploitation ability of PSO. — r , r : two distinct random numbers, r , r ∼ U (0, 1). They contribute to the stochastic nature 1 2 1 2 of the algorithm. — c ,c : acceleration coefficients which pull each particle toward particle best and global best 1 2 positions. — t: current iteration. lB — p : best previous position found so far by the particle, called local best. дB — p : best position discovered so far by the whole swarm, called global best. — ω × v (t ) : provides exploration ability for PSO. ij lB — c × r × (p (t ) − x (t )): represents private thinking. 1 1 ij ij дB — c × r × (p (t ) − x (t )) : represents collaboration of particles. 2 2 ij 4.5 Stability Analysis of PSO 4.5.1 Need for Stability Analysis. The stability analysis of PSO is carried out for the aforesaid problem and it is envisioned that the particle’s positions are converged to a fixed point in the desired search space. The two-stage (first-order and second-order) stability analysis is employed to present the variance of positions of particle converges to zero. Here, ANN is trained using PSO and the PSO trained ANN is utilized for the traffic classification of a network, hence by stability analysis on PSO, this can be concluded that the whole system that is proposed for software-defined network traffic classification is also stable. 4.5.2 The Stability Analysis. In this subsection, derivation of stability analysis is presented. For the same, pbest (t ) and дbest (t ) are kept fixed for an iteration, which is stated as a stagnation l д assumption [3, 10]. The objective function of the said problem plays a key role to specify the dimensions of the problems space through t and t which are the locations found so far. Hence, l д the description of the proposed algorithm can further be reduced for one-dimensional analysis purposes without loss of generality: д = ω ∗ д + γ ∗ (s − n ) + γ ∗ (t − v ), (4) s+1 s 1 s 2 д s v = v + n . (5) s+1 s s+1 Let γ s + γ s 1 l 2 д γ = γ + γ , p = . (6) 1 2 γ + γ 1 2 Then Equations (4) and (5) can be simplified as д = ω ∗ д + γ ∗ (t − v ). (7) s+1 s s Substituting Equations (7)in(5) the following can be obtained: v = v + ω ∗ n + γ ∗ (t − v ). (8) s+1 s s s In Equation (5)say s = s − 1: v = v + n ⇒ n = v − v . (9) s s−1 s s s s−1 Substituting Equations (9)in(8)gives v = v + ω ∗ v − ω ∗ v + γ ∗ (t − v ). (10) s+1 s s s−1 s ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:10 B. Pradhan et al. At initial state v = 0, substituting this in Equation (9), v − v = 0 ⇒ v = v . (11) s s−1 s−1 s Substituting Equations (11)in(10), v = v + γ ∗ (t − v ) ⇒ v = (1− γ ) ∗ v + γ ∗ t. (12) s+1 s s s+1 s Equation (12) is re-formed to present the procedure: q = (1− γ ) ∗ q + γ ∗ t, (13) s+1 s (γ ∗t1+γ ∗t2) 1 2 where q = v , q = v ,and t = . The first order stability analysis for a one- s+1 s+1 s s (γ +γ ) 1 2 dimensional stochastic sequence {q ,q ,...} (q randomly diverges for all the values of γ ). Here, 1 2 s th q is the introductory position of the i particle which can be expressed as D (q ) = (1− D (γ )) ∗ D (q ) + D (γ ) ∗ t, (14) s+1 s D (q ) = (1− τ ) ∗ D (q ) + τ ∗ t, (15) s+1 γ s γ where τ is the expected value of γ and it is uniformly random [0, 1]. So, D (γ ) = . Equation (15) can be re-written as 1 1 D (q ) = ∗ D (q ) + ∗ t . (16) s+1 s 2 2 A recurrence relation generates the generic formulation D (q ) = ∗ (q − γ ) + t, (17) s 0 where q denotes initial position. Lemma 1. The sequence D (q ) is convergent and converges to t. Proof. D (q ) converging to t implies that a value of s (ν ) exists for∃ ν > 0, such that if s > s (ν ), |D (q ) − t| < ν. (18) Considering Equation (17), | (q − Z )| < ν. (19) Therefore it can be concluded that | (q − Z )| 2 > . (20) So, | (q − Z )| s > log . (21) ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:11 4.6 Second-Order Stability Analysis There must be an agreement that the variance converges to assure second-order stability of the par- ticle positions. The mathematical derivation for the variance of the random variable v is presented as follows: 2 2 N (q) = D (q ) − D (q). (22) 2 2 Thus, it is required to find out D (q ) to calculate N (q). Let us assume q can be calculated as s+1 follows: 2 2 q = [(1− γ ) ∗ q + γ ∗ p] , (23) s+1 2 2 2 2 2 q = (1− 2∗ γ + γ ) ∗ q + 2∗ γ (1− γ ) ∗ p ∗ q + γ ∗ t . (24) s+1 s Hence, the expected value of q is s+1 2 2 D[(1− 2∗ γ + γ ) ∗ q 2 s D (q ) = (25) s+1 2 2 +2∗ γ (1− γ ) ∗ t ∗ q + γ ∗ t ]. Deriving Equation (25), 2 2 (1− 2∗ τ + D (γ )) ∗ D (q ) γ s D (y ) = +2∗ t (τ − D (β )) ∗ D (q ) (26) γ s s+1 2 2 +t ∗ D (γ ). As γ is a uniformly distributed random member which varies between 0 and 1, hence 2 1 D (γ ) = , (27) 2 1 N (γ ) = . Substituting Equations (27)in(26), we get 1 1 1 2 2 2 D (q ) = ∗ D (q ) + ∗ t ∗ D (q ) + ∗ t . (28) s+1 s 3 3 3 The value of D (q ) is calculated as follows: s+1 1 1 1 2 2 2 D (q ) = ∗ D (q ) + ∗ Z ∗ D (q ) + ∗ t . (29) s s s+1 4 2 4 Now N (q ) can be obtained by substituting Equations (28) and (29) in Equation (19)as s+1 1 1 N (q ) = ∗ N (q ) + ∗ D (q − t ). (30) s+1 s s 4 12 Since q − t = (1− γ )(q − t ), (31) s s−1 2 2 D (q − t ) = ∗ D (q − t ). (32) s s The following recurrence relation is mathematically expressed: 1 1 1 N (q ) = ∗ n (q ) + D (q − s ) ∗ − . (33) s 0 0 s s s 4 3 4 Lemma 2. The sequence n (q ) is convergent and converges at 0. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:12 B. Pradhan et al. Fig. 3. TP percentage and accuracy for ML algorithms. Proof. The ultimate limit of v (y ) becomes the following v (y ) = 0when t →∞: t 0 2 1 1 lim v (y ) = lim (E (y − p ) ) ∗ − t 0 t t 3 4 t→∞ t→∞ 2 1 1 (34) = E (y − p ) lim − 0 t t 3 4 t→∞ = 0. Hence, it is evident from the above analysis that the proposed PSO exemplifies the first- and second-order stability test. 4.7 Performance Metrics A Confusion Matrix is used for the classification accuracy. It is an important tool for analyzing the accuracy. It basically analyzes the classifier which is able to identify various classes. Three standard metrics are used for evaluating the classifier, which read as follows: Precision ( Pr), Recall (Rc), and F-Measure (fm) [Equation (35)]: Pr ∗ Rc fm = 2∗ , (35) Pr + Rc Tp Tp where Pre = and Rec = . Tp+Fp Tp+Fn — True positives (Tp): Actual class is positive, predicted class is positive. — True negatives (Tn): Actual class is negative, predicted class is negative. — False positives (Fp): Actual class is negative, predicted class is positive. — False negatives (Fn): Actual class is positive, predicted class is negative. Here, fm is calculated by the weighted average of Recall and Precision. Therefore, this score takes both false negatives and false positives into account. fm is more useful than accuracy, es- pecially for uneven class distributions. It ranges from 0 to 1. 0 means the worst classifier and 1 interprets the best classifier. 5 RESULTS AND DISCUSSIONS To test the effect of the aforesaid ML datasets validation issues (discussed in Section 3), three ex- periments were conducted. The first experiment was to find the change of traffic features when the network characteristics were changed from the dataset. The second experiment was to highlight the impact of online ML classification accuracy where the real majority and minority classes on the training datasets were not considered. The last experiment was to answer the question: does the value of the traffic features (such as the length of the packet and inter-arrival time) change if a hybridized optimization algorithm is applied, or it can say, can accuracy be improved for network traffic via evolutionary algorithms? ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:13 Table 1. Traffic Distribution of Different Applications Application Type Number of Traffic Proportion HTTP 5,521 82.34% FTP 195 2.75% Video Streaming 200 3% Instant Messaging 485 7.35% P2P 103 1.53% Two case studies have taken into account two collected datasets, namely, (1) the ANT dataset and the second one is (2) Kaggle Datasets. 5.1 Case Study 1: ANT Dataset The network traffic dataset has been obtained from the ANT Dataset website. Different formats of data are available. However, suitable traffic data has been extracted from the database for our application purposes. 5.1.1 Scenarios Considered. Traffic features mean the traffic patterns used in ML classifier datasets such as the length of the packet and inter-arrival time. Skype traffic has gained promi- nence and attention worldwide to be one of the most popular forms of VoIP software. Also, Skype can represent P2P applications. Therefore, the case study protocols for this experiment are HTTP, FTP, Video Streaming, instant messaging, and peer-to-peer protocol, and TP percentage compar- isons among the ML algorithms have been presented in Figure 3(a). Traffic distribution data for the used protocols have been visualized in Table 1.FromTable 1, it is obvious that HTTP is car- rying the most traffic in comparison to other applications and this affects the accuracy when the classifier was juxtaposed. From Figure 3(b), it is envisioned that the classifier has suggested less traffic accuracy than other applications. It may be due to carrying the most number of traffic. On the other hand, the accuracy of Instant Messaging was found highest in comparison to others. It may be due to carrying less traffic load. Based on the number of traffic, the corresponding propor- tion percentage has been extracted and the accuracy comparison percentage has been presented for three types of ML algorithms. The overall comparison has been depicted in Figure 4(a). As was stated previously, some questions will be answered—whether there will be any change in traffic features if evolutionary algorithms will be implemented. However, the answer has been given via Figure 4(a) where it is obvious that by incorporating a hybrid evolutionary algorithm, accuracy improvement is much better than a single optimization algorithm. Thus, it demands evolutionary algorithms to improve the network traffic classification. 5.2 Case Study 2: Kaggle Dataset The same methods have been applied to this dataset too. One key difference from the previous case is that evolutionary optimization algorithms were not used in this case. Only classifier-based schemes have been incorporated. In UWSN, each node has to send the data to the sink directly or indirectly. In some cases, a node transmits its data to the sink, whereas in other cases, the technique of data aggregation is applied due to which the transmitted data is collected at a particular node and then transmitted to the sink. To analyze the application of data aggregation and its results we have compared the simulation results of different techniques with and without PSO implementation. In this scenario, the performance metrics are taken as average delay, average packet drop, and average https://ant.isi.edu/datasets/index.html. http://statweb.stanford.edu/~sabatti/data.html. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:14 B. Pradhan et al. Fig. 4. Case study results. energy consumption. Figure 4(b) shows the average energy consumed in used ML algorithms with and without PSO. It is inferred from the results that LR when applied without PSO in UWSN involves more collision amongst the data packets as compared to the technique applied with the PSO. Thus, at particular intervals of time there occurs 17% less delay in the case with PSO in comparison to the technique without PSO. Figure 4(b) and (c) present the parameters used based on the Kaggle dataset where ML algorithms have been applied. 6 CONCLUSION In this article, ML algorithms have been applied for traffic classification of SDN networks for mak- ing informed decisions about underlying applications and their QoS requirements. Three ML clas- sifiers, namely, FFNN, BN, and LR, were juxtaposed with a hybrid NN-PSO to normalize datasets for classification purposes as well as for improving the efficacy of the training and testing dataset collected from open source sites. Accuracy of the traffic classification has been carried out us- ing ML algorithms. Additionally, the implementation of NN-PSO enhances the accuracy of the traffic classification with the same classifiers. The proposed method is promising because it does not impose any processing overhead. Even though UWSNs have received a great number of im- provements in the previous few years, there is still substantial room for improvement, especially in implementing systems on a large scale. In future work, researchers can offer better solutions on node mobility with high monitoring area (with high neighborhood range) scenarios to investi- gate the effect on network connectivity, coverage, energy consumption, and network lifetime. To increase efficiencies of the UWSNs and improve their performance, the studies should direct the focus of the prospective research toward implementing cooperative control among a few underwa- ter vehicles. In future works, the detection of flows of a newer application that have not yet been part of the trained classifier will be explored with the implementations on a varied set of platforms (Windows, iOS, Linux). REFERENCES [1] Riyad Alshammari and A. Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Iden- tifying SSH and Skype. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1–8. [2] Pedro Amaral, Joao Dinis, Paulo Pinto, Luis Bernardo, Joao Tavares, and Henrique S. Mamede. 2016. Machine learning in software defined networks: Data collection and traffic classification. In 2016 IEEE 24th International Conference on Network Protocols (ICNP’16). IEEE, 1–5. [3] Maurice Clerc and James Kennedy. 2002. The particle swarm-explosion, stability, and convergence in a multidimen- sional complex space. IEEE Transactions on Evolutionary Computation 6, 1 (2002), 58–73. [4] Russell Eberhart and James Kennedy. 1995. Particle swarm optimization. In Proceedings of the IEEE International Con- ference on Neural Networks, Vol. 4. Citeseer, 1942–1948. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:15 [5] Madeleine Glick and Houman Rastegarfar. 2017. Scheduling and control in hybrid data centers. In 2017 IEEE Photonics Society Summer Topical Meeting Series (SUM’17). IEEE, 115–116. [6] M. W. Hussain, B. Pradhan, X. Z. Gao, K. H. K. Reddy, and D. S. Roy. 2020. Clonal selection algorithm for energy minimization in software defined networks. Applied Soft Computing 96 (2020), 106617. [7] Vivek Jha. 2020. Communication system. (2020). U.S. Patent No. 10,728,213, Filed July 17th, 2012. [8] Diego Kreutz, Fernando M. V. Ramos, Paulo Esteves Verissimo, Christian Esteve Rothenberg, Siamak Azodolmolky, and Steve Uhlig. 2015. Software-defined networking: A comprehensive survey. Proceedings of the IEEE 103, 1 (2014), 14–76. DOI:10.1109/JPROC.2014.2371999 [9] Yunchun Li and Jingxuan Li. 2014. MultiClassifier: A combination of DPI and ML for application-layer classification in SDN. In The 2014 2nd International Conference on Systems and Informatics (ICSAI’14). IEEE, 682–686. [10] Zhao-Guang Liu, Xiu-Hua Ji, and Yun-Xia Liu. 2018. Hybrid non-parametric particle swarm optimization and its stability analysis. Expert Systems with Applications 92 (2018), 256–275. [11] Robert Martin and Sanguthevar Rajasekaran. 2016. Data centric approach to analyzing security threats in underwater sensor networks. In OCEANS 2016 MTS/IEEE Monterey. IEEE, 1–6. [12] José-Miguel Moreno-Roldán, Miguel-Ángel Luque-Nieto, Javier Poncela, and Pablo Otero. 2017. Objective video qual- ity assessment based on machine learning for underwater scientific applications. Sensors 17, 4 (2017), 664. [13] Akihiro Nakao and Ping Du. 2018. Toward in-network deep machine learning for identifying mobile applications and enabling application specific network slicing. IEICE Transactions on Communications 101-B (2018), 1536—1543. [14] Thuy T. T. Nguyen and Grenville Armitage. 2008. A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys & Tutorials 10, 4 (2008), 56–76. [15] Buddhadeb Pradhan, Arijit Nandi, Nirmal Baran Hui, Diptendu Sinha Roy, and Joel J. P. C. Rodrigues. 2019. A novel hybrid neural network-based multirobot path planning with motion coordination. IEEE Transactions on Vehicular Technology 69, 2 (2019), 1319–1327. [16] Zafar Ayyub Qazi, Jeongkeun Lee, Tao Jin, Gowtham Bellala, Manfred Arndt, and Guevara Noubir. 2013. Application- awareness in SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. 487–488. [17] Sandhya Rathee, Yash Sinha, and K. Haribabu. 2017. A survey: Hybrid SDN. Journal of Network and Computer Appli- cations 100 (2017), 35–55. [18] Dario Rossi and Silvio Valenti. 2010. Fine-grained traffic classification with netflow data. In Proceedings of the 6th International Wireless Communications and Mobile Computing Conference. 479–483. [19] Pu Wang, Shih-Chun Lin, and Min Luo. 2016. A framework for QoS-aware traffic classification using semi-supervised machine learning in SDNs. In 2016 IEEE International Conference on Services Computing (SCC’16). IEEE, 760–765. [20] Peng Xiao, Wenyu Qu, Heng Qi, Yujie Xu, and Zhiyang Li. 2015. An efficient elephant flow detection with cost- sensitive in SDN. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom’15). IEEE, 24–28. [21] Junfeng Xie, F. Richard Yu, Tao Huang, Renchao Xie, Jiang Liu, Chenmeng Wang, and Yunjie Liu. 2018. A survey of machine learning techniques applied to software defined networking (SDN): Research issues and challenges. IEEE Communications Surveys & Tutorials 21, 1 (2018), 393–430. [22] Abbas Yazdinejad, Reza M. Parizi, Ali Dehghantanha, Gautam Srivastava, Senthilkumar Mohan, and Abedallah M. Rababah. 2020. Cost optimization of secure routing with untrusted devices in software defined networking. Journal of Parallel and Distributed Computing 143 (2020), 36–46. [23] Abbas Yazdinejad, Reza M. Parizi, Gautam Srivastava, Ali Dehghantanha, and Kim-Kwang Raymond Choo. 2019. En- ergy efficient decentralized authentication in internet of underwater things using blockchain. In 2019 IEEE Globecom Workshops (GC Wkshps’19). IEEE, 1–6. Received January 2021; revised May 2021; accepted July 2021 ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Sensor Networks (TOSN) Association for Computing Machinery

Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics

Loading next page...
 
/lp/association-for-computing-machinery/traffic-classification-in-underwater-networks-using-sdn-and-data-HXy0heGDEh
Publisher
Association for Computing Machinery
Copyright
Copyright © 2022 Association for Computing Machinery.
ISSN
1550-4859
eISSN
1550-4867
DOI
10.1145/3474556
Publisher site
See Article on Publisher Site

Abstract

Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics B. PRADHAN, National Institute of Technology Jamshedpur, India GAUTAM SRIVASTAVA, Brandon University, Canada D. S. ROY, National Institute of Technology Meghalaya, India K. H. K. REDDY, National Institute of Science and Technology, India JERRY CHUN-WEI LIN, Western Norway University of Applied Sciences, Norway Software-Defined Networks (SDNs), with their segregated data and control planes, has proved to be capa- ble of managing massive amounts of data by leveraging distributed information available across the network for informed decision-making at the network controller. However, with the proliferation of next-generation, real-time Internet of Things (IoT) applications that vary greatly in terms of data frequency and volumes, data traffic classification can substantially assist SDN controllers toward efficient routing and traffic engineer- ing decisions. Existing works on network classification are limited by their application-centric nature, thus overlooking the key criterion for real-time IoT applications, namely, Quality of Service (QoS). In this article, we focus on augmenting SDN controllers’ decision-making capacity and Underwater Sensor Networks with machine learning algorithms to achieve real-time, QoS-aware, network traffic classification. Three classifiers, namely, Feed-forward Neural Network, Naïve Bayes, and Logistics Regression have been employed with a novel Artificial Neural Network and Particle Swarm Optimization hybridization scheme by carrying first- and second-order stability analysis for performance improvement of these classifiers. In short, the proposed framework exploits optimization algorithms and semi-supervised machine learning (ML) for precise traf- fic classification while keeping communication overhead between controller and switches minimal. Results obtained from real-life datasets demonstrate the efficacy of our proposed scheme. CCS Concepts: • Information systems → Data cleaning;• Theory of computation → Evolutionary algorithms;• Security and privacy→ Data anonymization and sanitization; Additional Key Words and Phrases: Software defined network, network traffic classification, Feed-Forward Neural Network (FFNN), Naïve Bayes, logistics regression, Particle Swarm Optimization (PSO) Authors’ addresses: B. Pradhan, Department of Computer Science and Engineering, National Institute of Technology Jamshedpur, Jamshedpur, Jharkhand, 831014, India; email: buddhadebpradhan@gmail.com; G. Srivastava (corresponding author), Department of Math and Computer Science, Brandon University, Brandon, Manitoba, CANADA, R7A 6A9 and Research Centre for Interneural Computing, China Medical University, Taichung 40402, Taiwan; email: srivastavag@ brandonu.ca; D. S. Roy, Department of Computer Science and Engineering, National Institute of Technology Megha- laya, Shillong, Meghalaya, 793003, India; email: diptendu.sr@nitm.ac.in; K. H. K. Reddy, Department of Computer Science and Engineering, GITAM University, Visakhapatnam, Andhra Pradesh 530045, India; email: khemant.reddy@gmail.com; J. C.-W. Lin, Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway Uni- versity of Applied Sciences, Inndalsveien 28, 5063 Bergen, Norway; email: jerrylin@ieee.org. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1550-4859/2022/04-ART34 $15.00 https://doi.org/10.1145/3474556 ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:2 B. Pradhan et al. ACM Reference format: B. Pradhan, Gautam Srivastava, D. S. Roy, K. H. K. Reddy, and Jerry Chun-Wei Lin. 2022. Traffic Classification in Underwater Networks Using SDN and Data-Driven Hybrid Metaheuristics. ACM Trans. Sen. Netw. 18, 3, Article 34 (April 2022), 15 pages. https://doi.org/10.1145/3474556 1 INTRODUCTION The rapid growth of a large number of applications has led to a tremendous rise in data. Such data generated has stringent networking requirements. Traditional network devices have both data and control planes strongly coupled together with proprietary protocols and closed interfaces, thus making handling issues such as policy enforcement and user-aware routing that can vary in complexity [17]. Software-defined networks (SDNs) are a recent network paradigm that sets apart data and control planes [22]. This separation of planes and centralization of controllers offers a great deal of flexibility and innovation in the network for policy enforcement based on network requirements, thus removing vendor lock-in [6]. The communication between the control plane and data plane is governed by a southbound Application Programming Interface (API) known as OpenFlow [8]. The OpenFlow protocol in SDN is leveraged by the data plane to dispatch the network statistics to the control plane. The control plane then formulates policies for every flow in the network and thus imparts logic to the data plane which is depicted in Figure 1. SDN has paved the path for easy handling of big data flows in networks, be it data streams from Internet of Things (IoT) devices to Cloud datacenters or intra-datacenter network traffic. This has been made possible by leveraging distributed information amassed across the network available to the SDN controller for informed decision-making. However, with the proliferation of next-generation, real-time IoT applications that vary greatly in terms of data frequency and data stream volumes, data traffic classification can substantially assist SDN controllers toward efficient routing and traffic engineering decisions. Existing works on network classification are limited by their application-centric nature, thus overlooking the key criterion for real-time IoT applications, namely, Quality of Service (QoS). Machine learning (ML) helps to make effective decisions from the prediction of real-time and historical data [8]. Network statistics amassed at every switch of an SDN network (that collectively makes up the data plane of SDN) can be easily monitored and leveraged gainfully by the controller for implementing intelligent decisions to be implemented at the forwarding plane. To cater to the demands of a large number of applications and to effectively handle conflict- ing resource requests, the need to design application-aware networks could be felt. For instance, underwater wireless sensor networks comprise nodes that are deployable on the surface and un- der the water. All nodes need to communicate and exchange information with other nodes in the same network and with the base station. Communication systems in the sensor network involve the transmission of data using acoustic, electromagnetic, or optical wave media. Among these types of media, acoustic communication is the most popular and widely used method due to its at- tenuation features in the water. The factor of low transmission is derived from the absorption and conversion of energy into heat in the water. Meanwhile, acoustic signals operate at low frequencies, which enables them to be transmitted and received over long distances. The key requirement for this kind of application-aware networking is network traffic classification; though it is not so easy to implement traffic classification when the sensor networks are underwater. However, the traffic classification at the controller helps to make informed decisions about the applications’ network requirements. Such traffic classification would pave the path for segregation of large and small ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:3 Fig. 1. A boad overview of SDN planes. flows that affect resource requirement fulfillment and thus datacenter performance considerably [21]. This separation of large and small flows is necessary as the large flows consume bandwidth considerably, thus overcoming performance deterioration of small flows which are typically delay intolerant. Further, for QoS-aware-based applications to meet the resource allocation requirements, it is required network traffic classification and fulfilling such network requirements becomes de- sirable for seamless functioning of the network [21]. The centralized view of the whole network, traffic classification at the controller in SDN helps to formulate application-specific rules which are critical for the network to work efficiently and in a seamless manner. However, accurate traf- fic classifications are still a research problem. In this article, an attempt has been made to the software-defined traffic classification problem from a new perspective by employing evolutionary- based ML algorithms jointly for improved network traffic classification. The main contributions of this article include the incorporation of evolutionary algorithms-based classifiers for network traffic classification and selecting proper network traffic databases obtained from real-life Inter- net data. Three classification algorithms, namely, Feed-Forward Neural Network (FFNN), Naïve Bayes, and Logistics Regression (LR), have been employed on a hybrid Neural Network using Particle Swarm Optimization (NN-PSO) for fine-tuning the performance; particularly readily available datasets are not there for this purpose. To the best of our knowledge, such SDN traffic classification attempted in this study and performance improvement with hybrid approaches have still been unexplored in literature. The remainder of this article is organized as follows: Section 2 presents a brief background to this research and discusses the state of the art. Section 3 describes the underwater sensor networks related to ML techniques. Section 4 discusses the problem formulation and provides the solution methodology. Implementation details and results are discussed in Section 5. Lastly, Section 6 con- cludes and provides further avenues for this research. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:4 B. Pradhan et al. 2 RELATED WORKS Underwater Wireless Sensor Networks (UWSNs) are developing quickly and receiving signif- icant attention, becoming the main focus of both researchers and practitioners [23]. With high technological advances in UWSN, sensors have become smarter, smaller, and more flexible with lower power consumption, increased processing capacity, and the ability to operate in various underwater applications. Also, UWSN technology can be integrated with Internet Protocol-based systems in supporting the IoT and machine-to-machine (M2M) frameworks for real-time mon- itoring. The rapid growth of the UWSN domain and the availability of modern sensor node tech- nologies have forced the necessity to ensure that awareness is increasing every year due to their compatibility and broad application in various sectors. There have been several attempts to classify network traffic into a set of varied categories. These categories include QoS-aware, flow-aware, and application-aware traffic classification. Most of the research has focused on application-aware traffic classification. Very few have focused on classifica- tion based on QoS and flow awareness. QoS classification helps to detect the classes of a multitude of flows. Wang et al. [ 19] classified traffic into various classes based on the QoS. The proposed work utilizes Deep Packet Inspection (DPI) and semi-supervised learning for the classification of traffic. Former DPI is utilized for labeling a part of predefined applications. Flow classification of new applications is done using trained models from apriori known datasets using the Lapla- cian Support Vector Machine (SVM). This methodology is used for categorizing known and unknown applications into varied sets of QoS classes. The obtained results demonstrated for the proposed system have an accuracy of over 90%. Flow-aware classification aims to segregate the network traffic into a set of mice and elephant flows. Elephant flows transfer huge data into the network while the latter are cursory and are usually delay tolerant. Glick et al. [5] focused on scheduling flows in a hybrid data center. For making elephant flow-aware traffic classification ML techniques are employed at the edge of the network. This classification is used by the SDN controller to implement an efficient traffic flow optimization algorithm. Xiao et al. [20] used a two-way cost-effective strategy for the identifica- tion of elephant flows. Firstly, the head packet was used for the identification of elephant flows. Secondly, a decision tree is employed to analyze whether the categorized flows are an elephant or not. Amaral et al. [2] employed an OpenFlow-based SDN system is deployed in an enterprise network to allow the collection of traffic data. After the collection of data, several classifier algo- rithms are used for the classification of traffic flows into varied applications. Li and Li [ 9]used a MultiClassifier to classify applications by using a combination of ML and DPI-based classifiers. The First ML-based classifier is used for every new flow arrival. The application is deemed to be of MultiClassifier only if the reliability of the ML-based classifier is larger than a threshold value. Otherwise, an accurate classifier like DPI is used. If DPI does not return “unknown,” its result will be selected. Rossi and Valenti [18] classify applications based on running User Datagram Pro- tocol (UDP). In this article, we present classification of a behavioral classification engine that is application aware. Depending upon the count of received packets and bytes, UDP-based traffic is classified with the help of the SVM algorithm. The SVM-based classification has an accuracy of over 90%. Qazi et al. [16] propose a framework called Atlas. The proposed method was used to classify mobile applications. For the allocation of ground truth data, a crowdsourcing approach is used. The collected data from the end devices are used for training the decision tree. This training model helps in the identification of traffic flows belonging to mobile applications. The accuracy of the top 40 Google Play applications is over 94%. Nakao and Du [13] identified mobile applications using deep Neural Networks (NNs). The data collected belongs to an experimental network. In the eight-layer deep NN model five flow features are selected (Packet size, TTL, Destination Port, ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:5 Destination Address, and Protocol type). The results demonstrate the accuracy of 93.5% for about 200 mobile applications. 3 ML-BASED DATA ORGANIZATION FOR UWSN UWSNs consist of sensor nodes and vehicles deployed underwater used to monitor underwater conditions. These underwater conditions can be temperature and pressure. It is also known as an underwater Acoustic network or underwater communication network. Nowadays, UWSN is chal- lenging because of limited battery power and bandwidth and the requirement of dense deployment of the sensor. Some of the important applications of UWSN are Oceanography data collection, Pol- lution, and environment monitoring, Disaster prevention, scientific exploration in the underwater environment, and so on. These applications depend on collected and transmitted data in UWSN and they predict disasters like floods, hurricanes, earthquakes, tsunamis, tornadoes, and volcanic eruptions. Data agglomeration is a process that can be used to solve the problem related to the collection and storage of data. This process may serve as a supplementary process to the routing process. UWSN poses various functional challenges that have been addressed so far by the us- age of ML techniques. Functional challenges consist of Clustering and Data Agglomeration, Event Detection, Query Processing Routing in UWSN, and Localization and Object Tracking. Data Agglomeration is an iterative classification method. In this method, firstly all the data points are a cluster of their own, then take two nearest and join them to form one single cluster, and lastly processed recursively until it obtains the desired number of clusters. It is a bottom-up technique and it works from the differences between the objects to be grouped. The data agglomer- ation method can be used by principle Component analysis and self-organization map technique. K-means algorithm is a popular ML clustering algorithm for collaborative data processing in Clus- tering and Data Agglomeration. The network property of utmost concern is that of clustering so Large Scale Network Clustering can be used by Neural networks. Event Detection and Query Processing are most important for the functional challenges of UWSN. There are lots of Event Detection and Query Processing methods like event Recognition, Forest Fire Detection, Query Processing, Distributed Event Detection, and Query Optimization methods. Using a Bayesian algorithm for event Recognition, using K-Nearest Neighbour for Query Processing, using Neural Network for Forest Fire Detection, and using Principal Component Anal- ysis for Query Optimization methods [11]. The design task of routing protocols for UWSN is quite challenging because of multiple char- acteristics which differentiate them from wireless infrastructure-less networks. Some design chal- lenges are observed in UWSN due to bandwidth, energy, and processing storage, so some essential features are most important for UWSN, such as energy efficiency, data transmission models, and sensor location. Using Self Organized Map and Reinforcement Learning for Data Routing and Rout- ing Enhancement in UWSN. Object Detection is that your algorithm may find multiple detections of the same objects. In UWSN, we first localize the object using SVM and a Decision tree. One of the popular applications of CNN is Object Detection/Localization in UWSN. UWSN poses various Non-functional challenges also which have been addressed so far by the usage of ML techniques. They are Security and Anomaly Intrusion detection, QoS, Data Integrity and Fault Detection, and Varied Applications. Anomaly-based network intrusion detection performs in protecting networks against malicious activities. Outliers are extreme values that deviate from other observations on data and three algo- rithms can detect the outlier. Those are Bayesian Belief Network, K-Nearest Neighbor, and SVM. In UWSN, random occurrences of faulty nodes degrade the QoS of the network. In this article, we propose an efficient fault detection scheme to manage a large-size UWSN. Using Neural Network, ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:6 B. Pradhan et al. It estimates a set of technologies that work on a network to guarantee its ability to dependably run high-priority applications. We can find the Accuracy and Reliability Prediction of the Sensor Network. Nowadays, Air Quality Observing and Intelligent Lighting Control is a popular non- functional challenge in UWSN using Neural Networks [12]. 3.1 Illustrative Example Consider a cluster of four machines in which nodes 1–3 are the WorkerNodes and node 4 is the MasterNode, which is implicit. There are some tasks in the system which have a different syntax of task execution like tasks scheduled at nodes 2 and 3 after time 3t. They represent the tasks that are yet to be completed and the representation t++t represents that the task already has executed for time t and the++ t represents that it is the estimated time left for the completion of this task. Let us suppose that normal tasks take a time of t and 2t. There are a total of nine map tasks in the job. At 4t in the diagram, all the tasks are scheduled; at this instant, we do profiling of nodes by the number of map tasks completed in the job. Thus, node 1 performs the majority of computation of four tasks, node 2 has completed two tasks with another task currently being processed on it, and node 3 has completed one task with another task currently processing on it. Since after 4t,node 1 is free, it will notify the Jobtracker that it is free via heartbeat. Since all the tasks are already scheduled at 4t, then node 1 can become a better candidate for scheduling speculated tasks. Now we check the remaining time of the tasks which are yet to be completed, i.e., at nodes 2 and 3. Based on the processed data and the data left unprocessed in this task, let the remaining time to complete the task at nodes 2 and 3 be t and 3t, respectively. The task at node 3 becomes suitable to be executed speculatively as it has the largest remaining time to be completed as its backing up time forthistaskis2t. Since the backup time of 2t is less than the estimated remaining time of 3t, this node is speculated at node 1 and hence it completes within a time of 2t. After the task is completed at node 1, it will let the Jobtracker know about its completion and the task still running at node 3 will be killed automatically. Hence in this scenario, with the help of speculation, there is a savings of time t in the completion time of the job. Thus, the performance will automatically improve with this controlled speculation. 4 NETWORK TRAFFIC CLASSIFICATION USING HYBRID EVOLUTIONARY ALGORITHM In this section, the network traffic classification problem and its proposed solution have been dis- cussed. Starting from data collection to obtaining optimized network traffic, all the used method- ologies have been briefly described. 4.1 Dataset Preparation The main problem encountered in the classification of Internet traffic classification using ML is the requirement of dataset validation and training. Normally, dataset characteristics are supposed to be similar to the real network environment. In this section, it has been analyzed that each of the issues highlights the factors that can affect the relationship between ML training and testing datasets. It is important to note that all these issues have a significant effect on online classification but are relatively less effective for offline classification. 4.1.1 Training and Testing ML Dataset Collected from the Same/Different Networks Environment. The networks include different configurations such as using real IP, using NATed IP [ 7], and so on. The pertinent question arises here: are the statistical features of Internet applications the same in different network scenarios? In other words, what is the effect while collecting training and testing datasets from the same/different network? Many sub-questions can arise from this main question, such as the following: ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:7 — If the training and testing dataset is collected from networks having different network re- quirements, what is the effect? — If the training and testing dataset is collected from networks having the same network re- quirements, what is the benefit? — If the network characteristics are changed, what traffic features will be affected by it? — During the training and testing of datasets in ML research papers, what information needs to be added? For the same class in ML Internet traffic classification, the training dataset is assumed to repre- sent the testing dataset. Different network scenarios can generate different traffic patterns. This variation in the traffic classification means the values of the traffic features (like packet length and their interior arrival time) can be different when the network segments are different. According to Nguyen and Armitage [14] many statistical properties of Internet applications are varied over some time. Alshammari and Zincir-Heywood [1] provided a better comparison between classifica- tion accuracies of Skype traffic. The datasets employed for training and tested were obtained for varied networks over multiple years. 4.2 Implementation of ML Algorithms The big challenge is to employ proper ML algorithms fitting to the problem’s suitability. Lots of ML algorithms are available but the best-fit algorithm selection to a specific SDN application is an intelligent demonstrating problem. Thus, two ML algorithms have been purposefully selected. One is Logistic Regression and another one is the Naïve Bayes algorithm. In this article, the selected problem was closely judged with other classifiers and it found these two are the best fit for the said problem only. Figure 2 delves the step-by-step working procedure of how the ML works. 4.2.1 Logistic Regression. Logistic regression categorizes the data based on binary responses in Data Modeling. It can predict the probability values which are restricted between the 0 and 1 interval. These probabilities are well decorated in comparison to other classifiers like Naïve Bayes, which can also predict probabilities. In the case of a categorized dependent variable, logistic re- gression is used. For instance, it is required to classify a tumor whether it is malignant (1) or not (0). In this scenario, one needs to set up a threshold value where classification is required to be done. Suppose, if the actual class is malignant, with a predicted continuous value of 0.5 and the threshold value is 0.6, the data point will be classified as not malignant, which ultimately can lead to serious ramifications in real time. 4.2.2 Naïve Bayes. The Naïve Bayes method follows the Bayes’ theorem and it assumes that predictions are independent; in that presence of a specific class feature is supposed to be non- correlated to that of any other feature. The Bayes theorem allows calculation of posterior proba- bility Pp (x|y ) from Pp (x ), P (y ),and Pp (y|x ). The mathematical expression is as follows: Pp (y|x )Pp (x ) Pp (x|y ) = , (1) P (y ) where Pp (x|y ) is the posterior probability of class (target) given predictor (attribute), Pp (x ) is the prior probability of class, Pp (y|x ) is the likelihood which is the probability of the predictor given class, and Pp (y ) is the prior probability of the predictor, respectively. 4.3 Brief Overview ANN ANN belongs to one of the most important and useful data-driven computational techniques which have been deployed in various network traffic problems. The input and output are related in a nonlinear way. The inputs for this said experiment include a port number, topology-related ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:8 B. Pradhan et al. Fig. 2. Flowchart of the proposed ML-based procedure. information, and traffic. The output is to find the shortest traffic for this said neural network. The details of ANN formulation have been extracted from the literature [15]. 4.4 Basics of PSO PSO is a stochastic population-based meta-heuristic algorithm first introduced by Kennedy and Eberhart in 1995 [4]. Suppose that the size of the swarm is N (population size) and the search space is D dimensional. th The position of the i particle is presented as x = (x , x ,..., x ) where x ∈ [lb ,ub ], id i1 i2 iD id d d th d ∈ [1, D], and lb andub arethe lowerand upperboundsofthe d dimension of the search space. d d th The i particle velocity is presented as v = (v ,v ,...,v ). At each timestep, the position and i i1 i2 iD velocity of the particles are updated according to the following equations: lB v (t + 1) = ω × v (t ) + c × r × p (t ) − x (t ) ij ij 1 1 ij ij дB +c × r × p (t ) − x (t ) , (2) 2 2 ij x (t + 1) = x (t ) + v (t + 1), (3) ij ij ij ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:9 where — ω: inertia weight which balances the exploration and exploitation ability of PSO. — r , r : two distinct random numbers, r , r ∼ U (0, 1). They contribute to the stochastic nature 1 2 1 2 of the algorithm. — c ,c : acceleration coefficients which pull each particle toward particle best and global best 1 2 positions. — t: current iteration. lB — p : best previous position found so far by the particle, called local best. дB — p : best position discovered so far by the whole swarm, called global best. — ω × v (t ) : provides exploration ability for PSO. ij lB — c × r × (p (t ) − x (t )): represents private thinking. 1 1 ij ij дB — c × r × (p (t ) − x (t )) : represents collaboration of particles. 2 2 ij 4.5 Stability Analysis of PSO 4.5.1 Need for Stability Analysis. The stability analysis of PSO is carried out for the aforesaid problem and it is envisioned that the particle’s positions are converged to a fixed point in the desired search space. The two-stage (first-order and second-order) stability analysis is employed to present the variance of positions of particle converges to zero. Here, ANN is trained using PSO and the PSO trained ANN is utilized for the traffic classification of a network, hence by stability analysis on PSO, this can be concluded that the whole system that is proposed for software-defined network traffic classification is also stable. 4.5.2 The Stability Analysis. In this subsection, derivation of stability analysis is presented. For the same, pbest (t ) and дbest (t ) are kept fixed for an iteration, which is stated as a stagnation l д assumption [3, 10]. The objective function of the said problem plays a key role to specify the dimensions of the problems space through t and t which are the locations found so far. Hence, l д the description of the proposed algorithm can further be reduced for one-dimensional analysis purposes without loss of generality: д = ω ∗ д + γ ∗ (s − n ) + γ ∗ (t − v ), (4) s+1 s 1 s 2 д s v = v + n . (5) s+1 s s+1 Let γ s + γ s 1 l 2 д γ = γ + γ , p = . (6) 1 2 γ + γ 1 2 Then Equations (4) and (5) can be simplified as д = ω ∗ д + γ ∗ (t − v ). (7) s+1 s s Substituting Equations (7)in(5) the following can be obtained: v = v + ω ∗ n + γ ∗ (t − v ). (8) s+1 s s s In Equation (5)say s = s − 1: v = v + n ⇒ n = v − v . (9) s s−1 s s s s−1 Substituting Equations (9)in(8)gives v = v + ω ∗ v − ω ∗ v + γ ∗ (t − v ). (10) s+1 s s s−1 s ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:10 B. Pradhan et al. At initial state v = 0, substituting this in Equation (9), v − v = 0 ⇒ v = v . (11) s s−1 s−1 s Substituting Equations (11)in(10), v = v + γ ∗ (t − v ) ⇒ v = (1− γ ) ∗ v + γ ∗ t. (12) s+1 s s s+1 s Equation (12) is re-formed to present the procedure: q = (1− γ ) ∗ q + γ ∗ t, (13) s+1 s (γ ∗t1+γ ∗t2) 1 2 where q = v , q = v ,and t = . The first order stability analysis for a one- s+1 s+1 s s (γ +γ ) 1 2 dimensional stochastic sequence {q ,q ,...} (q randomly diverges for all the values of γ ). Here, 1 2 s th q is the introductory position of the i particle which can be expressed as D (q ) = (1− D (γ )) ∗ D (q ) + D (γ ) ∗ t, (14) s+1 s D (q ) = (1− τ ) ∗ D (q ) + τ ∗ t, (15) s+1 γ s γ where τ is the expected value of γ and it is uniformly random [0, 1]. So, D (γ ) = . Equation (15) can be re-written as 1 1 D (q ) = ∗ D (q ) + ∗ t . (16) s+1 s 2 2 A recurrence relation generates the generic formulation D (q ) = ∗ (q − γ ) + t, (17) s 0 where q denotes initial position. Lemma 1. The sequence D (q ) is convergent and converges to t. Proof. D (q ) converging to t implies that a value of s (ν ) exists for∃ ν > 0, such that if s > s (ν ), |D (q ) − t| < ν. (18) Considering Equation (17), | (q − Z )| < ν. (19) Therefore it can be concluded that | (q − Z )| 2 > . (20) So, | (q − Z )| s > log . (21) ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:11 4.6 Second-Order Stability Analysis There must be an agreement that the variance converges to assure second-order stability of the par- ticle positions. The mathematical derivation for the variance of the random variable v is presented as follows: 2 2 N (q) = D (q ) − D (q). (22) 2 2 Thus, it is required to find out D (q ) to calculate N (q). Let us assume q can be calculated as s+1 follows: 2 2 q = [(1− γ ) ∗ q + γ ∗ p] , (23) s+1 2 2 2 2 2 q = (1− 2∗ γ + γ ) ∗ q + 2∗ γ (1− γ ) ∗ p ∗ q + γ ∗ t . (24) s+1 s Hence, the expected value of q is s+1 2 2 D[(1− 2∗ γ + γ ) ∗ q 2 s D (q ) = (25) s+1 2 2 +2∗ γ (1− γ ) ∗ t ∗ q + γ ∗ t ]. Deriving Equation (25), 2 2 (1− 2∗ τ + D (γ )) ∗ D (q ) γ s D (y ) = +2∗ t (τ − D (β )) ∗ D (q ) (26) γ s s+1 2 2 +t ∗ D (γ ). As γ is a uniformly distributed random member which varies between 0 and 1, hence 2 1 D (γ ) = , (27) 2 1 N (γ ) = . Substituting Equations (27)in(26), we get 1 1 1 2 2 2 D (q ) = ∗ D (q ) + ∗ t ∗ D (q ) + ∗ t . (28) s+1 s 3 3 3 The value of D (q ) is calculated as follows: s+1 1 1 1 2 2 2 D (q ) = ∗ D (q ) + ∗ Z ∗ D (q ) + ∗ t . (29) s s s+1 4 2 4 Now N (q ) can be obtained by substituting Equations (28) and (29) in Equation (19)as s+1 1 1 N (q ) = ∗ N (q ) + ∗ D (q − t ). (30) s+1 s s 4 12 Since q − t = (1− γ )(q − t ), (31) s s−1 2 2 D (q − t ) = ∗ D (q − t ). (32) s s The following recurrence relation is mathematically expressed: 1 1 1 N (q ) = ∗ n (q ) + D (q − s ) ∗ − . (33) s 0 0 s s s 4 3 4 Lemma 2. The sequence n (q ) is convergent and converges at 0. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:12 B. Pradhan et al. Fig. 3. TP percentage and accuracy for ML algorithms. Proof. The ultimate limit of v (y ) becomes the following v (y ) = 0when t →∞: t 0 2 1 1 lim v (y ) = lim (E (y − p ) ) ∗ − t 0 t t 3 4 t→∞ t→∞ 2 1 1 (34) = E (y − p ) lim − 0 t t 3 4 t→∞ = 0. Hence, it is evident from the above analysis that the proposed PSO exemplifies the first- and second-order stability test. 4.7 Performance Metrics A Confusion Matrix is used for the classification accuracy. It is an important tool for analyzing the accuracy. It basically analyzes the classifier which is able to identify various classes. Three standard metrics are used for evaluating the classifier, which read as follows: Precision ( Pr), Recall (Rc), and F-Measure (fm) [Equation (35)]: Pr ∗ Rc fm = 2∗ , (35) Pr + Rc Tp Tp where Pre = and Rec = . Tp+Fp Tp+Fn — True positives (Tp): Actual class is positive, predicted class is positive. — True negatives (Tn): Actual class is negative, predicted class is negative. — False positives (Fp): Actual class is negative, predicted class is positive. — False negatives (Fn): Actual class is positive, predicted class is negative. Here, fm is calculated by the weighted average of Recall and Precision. Therefore, this score takes both false negatives and false positives into account. fm is more useful than accuracy, es- pecially for uneven class distributions. It ranges from 0 to 1. 0 means the worst classifier and 1 interprets the best classifier. 5 RESULTS AND DISCUSSIONS To test the effect of the aforesaid ML datasets validation issues (discussed in Section 3), three ex- periments were conducted. The first experiment was to find the change of traffic features when the network characteristics were changed from the dataset. The second experiment was to highlight the impact of online ML classification accuracy where the real majority and minority classes on the training datasets were not considered. The last experiment was to answer the question: does the value of the traffic features (such as the length of the packet and inter-arrival time) change if a hybridized optimization algorithm is applied, or it can say, can accuracy be improved for network traffic via evolutionary algorithms? ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:13 Table 1. Traffic Distribution of Different Applications Application Type Number of Traffic Proportion HTTP 5,521 82.34% FTP 195 2.75% Video Streaming 200 3% Instant Messaging 485 7.35% P2P 103 1.53% Two case studies have taken into account two collected datasets, namely, (1) the ANT dataset and the second one is (2) Kaggle Datasets. 5.1 Case Study 1: ANT Dataset The network traffic dataset has been obtained from the ANT Dataset website. Different formats of data are available. However, suitable traffic data has been extracted from the database for our application purposes. 5.1.1 Scenarios Considered. Traffic features mean the traffic patterns used in ML classifier datasets such as the length of the packet and inter-arrival time. Skype traffic has gained promi- nence and attention worldwide to be one of the most popular forms of VoIP software. Also, Skype can represent P2P applications. Therefore, the case study protocols for this experiment are HTTP, FTP, Video Streaming, instant messaging, and peer-to-peer protocol, and TP percentage compar- isons among the ML algorithms have been presented in Figure 3(a). Traffic distribution data for the used protocols have been visualized in Table 1.FromTable 1, it is obvious that HTTP is car- rying the most traffic in comparison to other applications and this affects the accuracy when the classifier was juxtaposed. From Figure 3(b), it is envisioned that the classifier has suggested less traffic accuracy than other applications. It may be due to carrying the most number of traffic. On the other hand, the accuracy of Instant Messaging was found highest in comparison to others. It may be due to carrying less traffic load. Based on the number of traffic, the corresponding propor- tion percentage has been extracted and the accuracy comparison percentage has been presented for three types of ML algorithms. The overall comparison has been depicted in Figure 4(a). As was stated previously, some questions will be answered—whether there will be any change in traffic features if evolutionary algorithms will be implemented. However, the answer has been given via Figure 4(a) where it is obvious that by incorporating a hybrid evolutionary algorithm, accuracy improvement is much better than a single optimization algorithm. Thus, it demands evolutionary algorithms to improve the network traffic classification. 5.2 Case Study 2: Kaggle Dataset The same methods have been applied to this dataset too. One key difference from the previous case is that evolutionary optimization algorithms were not used in this case. Only classifier-based schemes have been incorporated. In UWSN, each node has to send the data to the sink directly or indirectly. In some cases, a node transmits its data to the sink, whereas in other cases, the technique of data aggregation is applied due to which the transmitted data is collected at a particular node and then transmitted to the sink. To analyze the application of data aggregation and its results we have compared the simulation results of different techniques with and without PSO implementation. In this scenario, the performance metrics are taken as average delay, average packet drop, and average https://ant.isi.edu/datasets/index.html. http://statweb.stanford.edu/~sabatti/data.html. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. 34:14 B. Pradhan et al. Fig. 4. Case study results. energy consumption. Figure 4(b) shows the average energy consumed in used ML algorithms with and without PSO. It is inferred from the results that LR when applied without PSO in UWSN involves more collision amongst the data packets as compared to the technique applied with the PSO. Thus, at particular intervals of time there occurs 17% less delay in the case with PSO in comparison to the technique without PSO. Figure 4(b) and (c) present the parameters used based on the Kaggle dataset where ML algorithms have been applied. 6 CONCLUSION In this article, ML algorithms have been applied for traffic classification of SDN networks for mak- ing informed decisions about underlying applications and their QoS requirements. Three ML clas- sifiers, namely, FFNN, BN, and LR, were juxtaposed with a hybrid NN-PSO to normalize datasets for classification purposes as well as for improving the efficacy of the training and testing dataset collected from open source sites. Accuracy of the traffic classification has been carried out us- ing ML algorithms. Additionally, the implementation of NN-PSO enhances the accuracy of the traffic classification with the same classifiers. The proposed method is promising because it does not impose any processing overhead. Even though UWSNs have received a great number of im- provements in the previous few years, there is still substantial room for improvement, especially in implementing systems on a large scale. In future work, researchers can offer better solutions on node mobility with high monitoring area (with high neighborhood range) scenarios to investi- gate the effect on network connectivity, coverage, energy consumption, and network lifetime. To increase efficiencies of the UWSNs and improve their performance, the studies should direct the focus of the prospective research toward implementing cooperative control among a few underwa- ter vehicles. In future works, the detection of flows of a newer application that have not yet been part of the trained classifier will be explored with the implementations on a varied set of platforms (Windows, iOS, Linux). REFERENCES [1] Riyad Alshammari and A. Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Iden- tifying SSH and Skype. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1–8. [2] Pedro Amaral, Joao Dinis, Paulo Pinto, Luis Bernardo, Joao Tavares, and Henrique S. Mamede. 2016. Machine learning in software defined networks: Data collection and traffic classification. In 2016 IEEE 24th International Conference on Network Protocols (ICNP’16). IEEE, 1–5. [3] Maurice Clerc and James Kennedy. 2002. The particle swarm-explosion, stability, and convergence in a multidimen- sional complex space. IEEE Transactions on Evolutionary Computation 6, 1 (2002), 58–73. [4] Russell Eberhart and James Kennedy. 1995. Particle swarm optimization. In Proceedings of the IEEE International Con- ference on Neural Networks, Vol. 4. Citeseer, 1942–1948. ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022. Traffic Classification in Underwater Sensor Networks 34:15 [5] Madeleine Glick and Houman Rastegarfar. 2017. Scheduling and control in hybrid data centers. In 2017 IEEE Photonics Society Summer Topical Meeting Series (SUM’17). IEEE, 115–116. [6] M. W. Hussain, B. Pradhan, X. Z. Gao, K. H. K. Reddy, and D. S. Roy. 2020. Clonal selection algorithm for energy minimization in software defined networks. Applied Soft Computing 96 (2020), 106617. [7] Vivek Jha. 2020. Communication system. (2020). U.S. Patent No. 10,728,213, Filed July 17th, 2012. [8] Diego Kreutz, Fernando M. V. Ramos, Paulo Esteves Verissimo, Christian Esteve Rothenberg, Siamak Azodolmolky, and Steve Uhlig. 2015. Software-defined networking: A comprehensive survey. Proceedings of the IEEE 103, 1 (2014), 14–76. DOI:10.1109/JPROC.2014.2371999 [9] Yunchun Li and Jingxuan Li. 2014. MultiClassifier: A combination of DPI and ML for application-layer classification in SDN. In The 2014 2nd International Conference on Systems and Informatics (ICSAI’14). IEEE, 682–686. [10] Zhao-Guang Liu, Xiu-Hua Ji, and Yun-Xia Liu. 2018. Hybrid non-parametric particle swarm optimization and its stability analysis. Expert Systems with Applications 92 (2018), 256–275. [11] Robert Martin and Sanguthevar Rajasekaran. 2016. Data centric approach to analyzing security threats in underwater sensor networks. In OCEANS 2016 MTS/IEEE Monterey. IEEE, 1–6. [12] José-Miguel Moreno-Roldán, Miguel-Ángel Luque-Nieto, Javier Poncela, and Pablo Otero. 2017. Objective video qual- ity assessment based on machine learning for underwater scientific applications. Sensors 17, 4 (2017), 664. [13] Akihiro Nakao and Ping Du. 2018. Toward in-network deep machine learning for identifying mobile applications and enabling application specific network slicing. IEICE Transactions on Communications 101-B (2018), 1536—1543. [14] Thuy T. T. Nguyen and Grenville Armitage. 2008. A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys & Tutorials 10, 4 (2008), 56–76. [15] Buddhadeb Pradhan, Arijit Nandi, Nirmal Baran Hui, Diptendu Sinha Roy, and Joel J. P. C. Rodrigues. 2019. A novel hybrid neural network-based multirobot path planning with motion coordination. IEEE Transactions on Vehicular Technology 69, 2 (2019), 1319–1327. [16] Zafar Ayyub Qazi, Jeongkeun Lee, Tao Jin, Gowtham Bellala, Manfred Arndt, and Guevara Noubir. 2013. Application- awareness in SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. 487–488. [17] Sandhya Rathee, Yash Sinha, and K. Haribabu. 2017. A survey: Hybrid SDN. Journal of Network and Computer Appli- cations 100 (2017), 35–55. [18] Dario Rossi and Silvio Valenti. 2010. Fine-grained traffic classification with netflow data. In Proceedings of the 6th International Wireless Communications and Mobile Computing Conference. 479–483. [19] Pu Wang, Shih-Chun Lin, and Min Luo. 2016. A framework for QoS-aware traffic classification using semi-supervised machine learning in SDNs. In 2016 IEEE International Conference on Services Computing (SCC’16). IEEE, 760–765. [20] Peng Xiao, Wenyu Qu, Heng Qi, Yujie Xu, and Zhiyang Li. 2015. An efficient elephant flow detection with cost- sensitive in SDN. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom’15). IEEE, 24–28. [21] Junfeng Xie, F. Richard Yu, Tao Huang, Renchao Xie, Jiang Liu, Chenmeng Wang, and Yunjie Liu. 2018. A survey of machine learning techniques applied to software defined networking (SDN): Research issues and challenges. IEEE Communications Surveys & Tutorials 21, 1 (2018), 393–430. [22] Abbas Yazdinejad, Reza M. Parizi, Ali Dehghantanha, Gautam Srivastava, Senthilkumar Mohan, and Abedallah M. Rababah. 2020. Cost optimization of secure routing with untrusted devices in software defined networking. Journal of Parallel and Distributed Computing 143 (2020), 36–46. [23] Abbas Yazdinejad, Reza M. Parizi, Gautam Srivastava, Ali Dehghantanha, and Kim-Kwang Raymond Choo. 2019. En- ergy efficient decentralized authentication in internet of underwater things using blockchain. In 2019 IEEE Globecom Workshops (GC Wkshps’19). IEEE, 1–6. Received January 2021; revised May 2021; accepted July 2021 ACM Transactions on Sensor Networks, Vol. 18, No. 3, Article 34. Publication date: April 2022.

Journal

ACM Transactions on Sensor Networks (TOSN)Association for Computing Machinery

Published: Apr 18, 2022

Keywords: Software defined network

References