Table of Contents
Fetching ...

Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift

Rodrigo Matos Carnier, Laura Lahesoo, Kensuke Fukuda

TL;DR

This work tackles binary anomaly detection in streaming IoT traffic under concept drift by comparing batch and streaming learning approaches using SIURU with an ADWIN drift detector across six heterogeneous simulations derived from three public IoT datasets. It demonstrates that batch models falter under drift, while streaming tree-based methods—most notably Adaptive Random Forest—achieve near-0.99 F1 with lower computational cost, highlighting the practicality of online drift adaptation. The study emphasizes the importance of dataset heterogeneity for realistic evaluation, showing that homogeneous data can mask drift-related weaknesses. Practically, the findings favor tree-based streaming detectors for online IoT anomaly detection, especially when rapid adaptation and efficiency are required.

Abstract

With the growing volume of Internet of Things (IoT) network traffic, machine learning (ML)-based anomaly detection is more relevant than ever. Traditional batch learning models face challenges such as high maintenance and poor adaptability to rapid anomaly changes, known as concept drift. In contrast, streaming learning integrates online and incremental learning, enabling seamless updates and concept drift detection to improve robustness. This study investigates anomaly detection in streaming IoT traffic as binary classification, comparing batch and streaming learning approaches while assessing the limitations of current IoT traffic datasets. We simulated heterogeneous network data streams by carefully mixing existing datasets and streaming the samples one by one. Our results highlight the failure of batch models to handle concept drift, but also reveal persisting limitations of current datasets to expose model limitations due to low traffic heterogeneity. We also investigated the competitiveness of tree-based ML algorithms, well-known in batch anomaly detection, and compared it to non-tree-based ones, confirming the advantages of the former. Adaptive Random Forest achieved F1-score of 0.990 $\pm$ 0.006 at one-third the computational cost of its batch counterpart. Hoeffding Adaptive Tree reached F1-score of 0.910 $\pm$ 0.007, reducing computational cost by four times, making it a viable choice for online applications despite a slight trade-off in stability.

Binary Anomaly Detection in Streaming IoT Traffic under Concept Drift

TL;DR

This work tackles binary anomaly detection in streaming IoT traffic under concept drift by comparing batch and streaming learning approaches using SIURU with an ADWIN drift detector across six heterogeneous simulations derived from three public IoT datasets. It demonstrates that batch models falter under drift, while streaming tree-based methods—most notably Adaptive Random Forest—achieve near-0.99 F1 with lower computational cost, highlighting the practicality of online drift adaptation. The study emphasizes the importance of dataset heterogeneity for realistic evaluation, showing that homogeneous data can mask drift-related weaknesses. Practically, the findings favor tree-based streaming detectors for online IoT anomaly detection, especially when rapid adaptation and efficiency are required.

Abstract

With the growing volume of Internet of Things (IoT) network traffic, machine learning (ML)-based anomaly detection is more relevant than ever. Traditional batch learning models face challenges such as high maintenance and poor adaptability to rapid anomaly changes, known as concept drift. In contrast, streaming learning integrates online and incremental learning, enabling seamless updates and concept drift detection to improve robustness. This study investigates anomaly detection in streaming IoT traffic as binary classification, comparing batch and streaming learning approaches while assessing the limitations of current IoT traffic datasets. We simulated heterogeneous network data streams by carefully mixing existing datasets and streaming the samples one by one. Our results highlight the failure of batch models to handle concept drift, but also reveal persisting limitations of current datasets to expose model limitations due to low traffic heterogeneity. We also investigated the competitiveness of tree-based ML algorithms, well-known in batch anomaly detection, and compared it to non-tree-based ones, confirming the advantages of the former. Adaptive Random Forest achieved F1-score of 0.990 0.006 at one-third the computational cost of its batch counterpart. Hoeffding Adaptive Tree reached F1-score of 0.910 0.007, reducing computational cost by four times, making it a viable choice for online applications despite a slight trade-off in stability.

Paper Structure

This paper contains 13 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of simulation setup and anomaly detector.
  • Figure 2: Comparison between batch and streaming ML. Top: single dataset. Bottom: mixed dataset. X-axis: samples. Y-axis: cumulative F1-score.
  • Figure 3: Comparison between tree-based and non-tree-based streaming ML methods. Top side: best runs. Bottom side: worst runs. X-axis: samples. Y-axis: cumulative F1-score. See \ref{['tab:simulations', 'tab:datasets']} for content of simulations.