Table of Contents
Fetching ...

DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS

Qingyang Li, Yihang Zhang, Zhidong Jia, Yannan Hu, Lei Zhang, Jianrong Zhang, Yongming Xu, Yong Cui, Zongming Guo, Xinggong Zhang

TL;DR

DoLLM tackles the challenge of detecting Carpet Bombing DDoS by turning non-language network flows into Flow Sequences and applying an open-source LLM backbone to learn inter-flow correlations for per-flow classification. The method introduces a Flow Sequentializer, Flow Tokenizer, Bidirectional Self-Attention LLMs, and Classification Projection to enable token-classification style detection in an LLM, with the backbone kept frozen during training. Evaluations on CIC-DDoS2019 and real ISP traces show DoLLM achieving state-of-the-art performance, including up to 33.3% zero-shot F1 improvement and at least 20.6% gain on real-world data, demonstrating strong generalization and practical utility. The work highlights the potential for ISP deployment, enabling more precise per-flow detection and finer-grained traffic diversion via scrubbing devices.

Abstract

It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.

DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS

TL;DR

DoLLM tackles the challenge of detecting Carpet Bombing DDoS by turning non-language network flows into Flow Sequences and applying an open-source LLM backbone to learn inter-flow correlations for per-flow classification. The method introduces a Flow Sequentializer, Flow Tokenizer, Bidirectional Self-Attention LLMs, and Classification Projection to enable token-classification style detection in an LLM, with the backbone kept frozen during training. Evaluations on CIC-DDoS2019 and real ISP traces show DoLLM achieving state-of-the-art performance, including up to 33.3% zero-shot F1 improvement and at least 20.6% gain on real-world data, demonstrating strong generalization and practical utility. The work highlights the potential for ISP deployment, enabling more precise per-flow detection and finer-grained traffic diversion via scrubbing devices.

Abstract

It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
Paper Structure (22 sections, 3 equations, 14 figures, 2 tables)

This paper contains 22 sections, 3 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Diagram of DoLLM to detect Carpet Bombing by representation learning. It aligns network flows data to LLM semantic space for better exploiting flow correlations.
  • Figure 2: Carpet Bombing features with lower-rate single flow traffic, multiple attack vectors and many-to-many attack natures.
  • Figure 3: There exists stronger correlations among malicious-flows than among benign-flows.
  • Figure 4: The overall architecture of DoLLM. It consists of four parts: Flow Sequentializer, Flow Tokenizer, Bidirectional Self-Attention LLMs, and Classification Projection.
  • Figure 5: Illustration of the workflow for Flow Sequentializer which generates contextual Flow Sequences from temporally unordered raw flows.
  • ...and 9 more figures