Table of Contents
Fetching ...

Time-Distributed Feature Learning for Internet of Things Network Traffic Classification

Yoga Suhas Kuruba Manjunath, Sihao Zhao, Xiao-Ping Zhang, Lian Zhao

TL;DR

This work tackles IoT network traffic classification by introducing a holistic-temporal feature learning framework that uses a time-distributed wrapper to extract intra-, inter-, and pseudo-temporal information from traffic flows. Traffic data are represented as greyscale video streams to enable CNN-driven intra-temporal feature extraction, followed by LSTM-based inter-temporal reasoning and a time-distributed FFNN to capture pseudo-temporal patterns, yielding three models: CNN-TD(FFNN), LSTM-TD(FFNN), and CNN-LSTM-TD(FFNN). Across four real-world datasets, the CNN-LSTM-TD(FFNN) model achieves the best performance, with average improvements of about 13.5% over state-of-the-art baselines and accuracies reaching around 94% for conventional NTC and 99% for CoS NTC, while still generalizing across diverse data sources. The approach introduces a universal, robust feature-learning paradigm that is less sensitive to initial hyperparameters and initial feature choices, with practical implications for QoS/RRM in IoT networks. Future work includes live IoT deployments, lightweight TD implementations, and extending the methodology to other time-series domains.

Abstract

Deep learning-based network traffic classification (NTC) techniques, including conventional and class-of-service (CoS) classifiers, are a popular tool that aids in the quality of service (QoS) and radio resource management for the Internet of Things (IoT) network. Holistic temporal features consist of inter-, intra-, and pseudo-temporal features within packets, between packets, and among flows, providing the maximum information on network services without depending on defined classes in a problem. Conventional spatio-temporal features in the current solutions extract only space and time information between packets and flows, ignoring the information within packets and flow for IoT traffic. Therefore, we propose a new, efficient, holistic feature extraction method for deep-learning-based NTC using time-distributed feature learning to maximize the accuracy of the NTC. We apply a time-distributed wrapper on deep-learning layers to help extract pseudo-temporal features and spatio-temporal features. Pseudo-temporal features are mathematically complex to explain since, in deep learning, a black box extracts them. However, the features are temporal because of the time-distributed wrapper; therefore, we call them pseudo-temporal features. Since our method is efficient in learning holistic-temporal features, we can extend our method to both conventional and CoS NTC. Our solution proves that pseudo-temporal and spatial-temporal features can significantly improve the robustness and performance of any NTC. We analyze the solution theoretically and experimentally on different real-world datasets. The experimental results show that the holistic-temporal time-distributed feature learning method, on average, is 13.5% more accurate than the state-of-the-art conventional and CoS classifiers.

Time-Distributed Feature Learning for Internet of Things Network Traffic Classification

TL;DR

This work tackles IoT network traffic classification by introducing a holistic-temporal feature learning framework that uses a time-distributed wrapper to extract intra-, inter-, and pseudo-temporal information from traffic flows. Traffic data are represented as greyscale video streams to enable CNN-driven intra-temporal feature extraction, followed by LSTM-based inter-temporal reasoning and a time-distributed FFNN to capture pseudo-temporal patterns, yielding three models: CNN-TD(FFNN), LSTM-TD(FFNN), and CNN-LSTM-TD(FFNN). Across four real-world datasets, the CNN-LSTM-TD(FFNN) model achieves the best performance, with average improvements of about 13.5% over state-of-the-art baselines and accuracies reaching around 94% for conventional NTC and 99% for CoS NTC, while still generalizing across diverse data sources. The approach introduces a universal, robust feature-learning paradigm that is less sensitive to initial hyperparameters and initial feature choices, with practical implications for QoS/RRM in IoT networks. Future work includes live IoT deployments, lightweight TD implementations, and extending the methodology to other time-series domains.

Abstract

Deep learning-based network traffic classification (NTC) techniques, including conventional and class-of-service (CoS) classifiers, are a popular tool that aids in the quality of service (QoS) and radio resource management for the Internet of Things (IoT) network. Holistic temporal features consist of inter-, intra-, and pseudo-temporal features within packets, between packets, and among flows, providing the maximum information on network services without depending on defined classes in a problem. Conventional spatio-temporal features in the current solutions extract only space and time information between packets and flows, ignoring the information within packets and flow for IoT traffic. Therefore, we propose a new, efficient, holistic feature extraction method for deep-learning-based NTC using time-distributed feature learning to maximize the accuracy of the NTC. We apply a time-distributed wrapper on deep-learning layers to help extract pseudo-temporal features and spatio-temporal features. Pseudo-temporal features are mathematically complex to explain since, in deep learning, a black box extracts them. However, the features are temporal because of the time-distributed wrapper; therefore, we call them pseudo-temporal features. Since our method is efficient in learning holistic-temporal features, we can extend our method to both conventional and CoS NTC. Our solution proves that pseudo-temporal and spatial-temporal features can significantly improve the robustness and performance of any NTC. We analyze the solution theoretically and experimentally on different real-world datasets. The experimental results show that the holistic-temporal time-distributed feature learning method, on average, is 13.5% more accurate than the state-of-the-art conventional and CoS classifiers.
Paper Structure (29 sections, 19 equations, 14 figures, 12 tables, 1 algorithm)

This paper contains 29 sections, 19 equations, 14 figures, 12 tables, 1 algorithm.

Figures (14)

  • Figure 1: (a) The original raw data in matrix form with $M$ samples and $N$ features. (b) An $N \times 1$ column vector for one sample. (c) The transformed representation of the sample into a matrix format, and ($R,C$) is a factor pair of $N$. (d) The final representation of the grey scale video stream using Algorithm \ref{['Al:DR_A_1']}, and $\bar{\bm{X}}$ is an $M \times R \times C$ tensor.
  • Figure 2: Holistic-temporal feature extraction using time-distributed feature learning employing deep-learning. We use CNN, LSTM, or CNN-LSTM as the deep learning architecture. Holistic-temporal features extracted by time-distributed deep learning are fed to the decision layer. The CNN-TD(FFNN) is expected to extract only inter-temporal and pseudo-temporal features. The LSTM-TD(FFNN) extracts inter and pseudo-temporal features. The CNN-LSTM-TD(FFNN) extracts the holistic, i.e., intra, inter, and pseudo-temporal features.
  • Figure 3: Holistic-temporal feature extraction from time-distributed deep learning as input for Decision layer for conventional or CoS classification. The Decision layer is implemented with an FFNN and softmax activation, and therefore, the output is a probability distribution, and the final decision is made by Eq. \ref{['Equ:dl_E_1']}. Time-distributed deep learning can be any of the 3 model configurations. The time-distributed deep learning layer consists of a deep learning layer and a time-distributed layer from Fig. \ref{['fig:MD_F_1']}. The exact number of units for the decision layer FFNN for all datasets is given in Table \ref{['table:dataset']}.
  • Figure 4: Implementation details of Model 1. 128 units of 2D CNN are used with a 3$\times$3 kernel window. $2 \times 2$ maxpool is introduced between CNN2D and batch normalization layers. The output of batch normalization consists of intra- or spatio-temporal features and is fed to 128 units of time-distributed FFNN. The extracted features consist of intra- and pseudo-temporal features.
  • Figure 5: Implementation details of Model 2. 128 units of LSTM are employed in the first layer with ReLU activation. The initial layer of LSTM extracts the inter-temporal features, which are fed to the time-distributed FFNN of 128 units. The flattened final output consists of inter- and pseudo-temporal features.
  • ...and 9 more figures