Table of Contents
Fetching ...

Active Learning for Network Traffic Classification: A Technical Study

Amin Shahraki, Mahmoud Abbasi, Amir Taherkordi, Anca Delia Jurcut

TL;DR

This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC, and results show that AL can achieve high accuracy with a small amount of data.

Abstract

Network Traffic Classification (NTC) has become an important feature in various network management operations, e.g., Quality of Service (QoS) provisioning and security services. Machine Learning (ML) algorithms as a popular approach for NTC can promise reasonable accuracy in classification and deal with encrypted traffic. However, ML-based NTC techniques suffer from the shortage of labeled traffic data which is the case in many real-world applications. This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC. AL reduces the need for a large number of labeled examples by actively choosing the instances that should be labeled. The study first provides an overview of NTC and its fundamental challenges along with surveying the literature on ML-based NTC methods. Then, it introduces the concepts of AL, discusses it in the context of NTC, and review the literature in this field. Further, challenges and open issues in AL-based classification of network traffic are discussed. Moreover, as a technical survey, some experiments are conducted to show the broad applicability of AL in NTC. The simulation results show that AL can achieve high accuracy with a small amount of data.

Active Learning for Network Traffic Classification: A Technical Study

TL;DR

This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC, and results show that AL can achieve high accuracy with a small amount of data.

Abstract

Network Traffic Classification (NTC) has become an important feature in various network management operations, e.g., Quality of Service (QoS) provisioning and security services. Machine Learning (ML) algorithms as a popular approach for NTC can promise reasonable accuracy in classification and deal with encrypted traffic. However, ML-based NTC techniques suffer from the shortage of labeled traffic data which is the case in many real-world applications. This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC. AL reduces the need for a large number of labeled examples by actively choosing the instances that should be labeled. The study first provides an overview of NTC and its fundamental challenges along with surveying the literature on ML-based NTC methods. Then, it introduces the concepts of AL, discusses it in the context of NTC, and review the literature in this field. Further, challenges and open issues in AL-based classification of network traffic are discussed. Moreover, as a technical survey, some experiments are conducted to show the broad applicability of AL in NTC. The simulation results show that AL can achieve high accuracy with a small amount of data.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The main steps in building a network traffic classifier.
  • Figure 2: Graphical description of active learning.
  • Figure 3: (a) Stream-based selective sampling, and (b) Pool-based sampling.
  • Figure 4: Percentage of the classification accuracy for stream-based and pool-based scenario on the Cambridge dataset.
  • Figure 5: Experimental results on by using different query strategies on NTC datasets. Each point in the graphs shows what percentage of the dataset has been used for training, ranging from %0.5, %1, %2, %4, to %64, as indicated in Tables \ref{['tbl:trabid)']}, \ref{['tbl:vpn']} and \ref{['tbl:tornotor']}.