Table of Contents
Fetching ...

A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection

Raja Giryes, Lior Shafir, Avishai Wool

TL;DR

This work tackles the challenge of timely and accurate DDoS detection by rethinking flow representation. It treats flows as variable-length streams of packet headers and classifies them with a Set-Tree model that supports permutation-invariant, set-based splits and an attention mechanism, enabling effective early detection from a handful of packets. The approach achieves near-perfect accuracy on CICDDoS2019 and strong performance on CICIDS2017, with substantial time savings when using only the first 2–4 packets, and it uses only 4–6% of traffic data. The method offers practical benefits in speed, interpretability, and payload-free detection, making it suitable for real-time network defense.

Abstract

Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.

A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection

TL;DR

This work tackles the challenge of timely and accurate DDoS detection by rethinking flow representation. It treats flows as variable-length streams of packet headers and classifies them with a Set-Tree model that supports permutation-invariant, set-based splits and an attention mechanism, enabling effective early detection from a handful of packets. The approach achieves near-perfect accuracy on CICDDoS2019 and strong performance on CICIDS2017, with substantial time savings when using only the first 2–4 packets, and it uses only 4–6% of traffic data. The method offers practical benefits in speed, interpretability, and payload-free detection, making it suitable for real-time network defense.

Abstract

Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.
Paper Structure (26 sections, 8 equations, 8 figures, 15 tables)

This paper contains 26 sections, 8 equations, 8 figures, 15 tables.

Figures (8)

  • Figure 1: An overview of our approach.
  • Figure 2: CICDDoS2019 flow duration distribution - a comparison between first 2-packets and complete flows.
  • Figure 3: CICIDS2017 flow duration distribution - a comparison between initial-packet streams and complete flows.
  • Figure 4: Using Set Tree pmlr-v139-hirsch21a for network flow detection. Each decision node contains a set-compatible split criterion, and returns a subset of its items (packets attention set). The dashed arrows demonstrate the attention mechanism, where each decision node operates on the original input set $F$, or on one of the attention sets returned by previous nodes along the decision path.
  • Figure 5: Performance comparison against state-of-the-art methods CIC2_ortet2021towards, CIC3_elsayed2020ddosnet, CIC5_novaes2020long, CIC8_almiani2021ddos using the CICDDoS2019 dataset. ST-full and ST-2 refer to our proposed model, evaluated on complete flow-stream inputs, and on first 2-packets inputs respectively.
  • ...and 3 more figures