Table of Contents
Fetching ...

HFL-FlowLLM: Large Language Models for Network Traffic Flow Classification in Heterogeneous Federated Learning

Jiazhuo Tian, Yachao Yuan

TL;DR

The paper tackles traffic flow classification in privacy-preserving heterogeneous federated learning settings common in 5G/IoT networks. It introduces HFL-FlowLLM, a framework that repurposes large language models by replacing the autoregressive head with a network head, compressing the model via layer dropping, and fine-tuning only near the output using LoRA, while deploying a noise-free, stacking-based aggregation and adaptive client training. The approach yields substantial gains over both HFL baselines and prior LLM-FL frameworks, achieving around $13\%$ higher average F1 than non-LLM HFL methods and up to $5\%$ improvements over other LLM-FL frameworks as client participation grows, with training costs reduced by about $87\%$. The results across five public datasets demonstrate strong accuracy, generalization, and robustness under non-IID conditions, highlighting the practical value of integrating LLMs into distributed network security workflows.

Abstract

In modern communication networks driven by 5G and the Internet of Things (IoT), effective network traffic flow classification is crucial for Quality of Service (QoS) management and security. Traditional centralized machine learning struggles with the distributed data and privacy concerns in these heterogeneous environments, while existing federated learning approaches suffer from high costs and poor generalization. To address these challenges, we propose HFL-FlowLLM, which to our knowledge is the first framework to apply large language models to network traffic flow classification in heterogeneous federated learning. Compared to state-of-the-art heterogeneous federated learning methods for network traffic flow classification, the proposed approach improves the average F1 score by approximately 13%, demonstrating compelling performance and strong robustness. When compared to existing large language models federated learning frameworks, as the number of clients participating in each training round increases, the proposed method achieves up to a 5% improvement in average F1 score while reducing the training costs by about 87%. These findings prove the potential and practical value of HFL-FlowLLM in modern communication networks security.

HFL-FlowLLM: Large Language Models for Network Traffic Flow Classification in Heterogeneous Federated Learning

TL;DR

The paper tackles traffic flow classification in privacy-preserving heterogeneous federated learning settings common in 5G/IoT networks. It introduces HFL-FlowLLM, a framework that repurposes large language models by replacing the autoregressive head with a network head, compressing the model via layer dropping, and fine-tuning only near the output using LoRA, while deploying a noise-free, stacking-based aggregation and adaptive client training. The approach yields substantial gains over both HFL baselines and prior LLM-FL frameworks, achieving around higher average F1 than non-LLM HFL methods and up to improvements over other LLM-FL frameworks as client participation grows, with training costs reduced by about . The results across five public datasets demonstrate strong accuracy, generalization, and robustness under non-IID conditions, highlighting the practical value of integrating LLMs into distributed network security workflows.

Abstract

In modern communication networks driven by 5G and the Internet of Things (IoT), effective network traffic flow classification is crucial for Quality of Service (QoS) management and security. Traditional centralized machine learning struggles with the distributed data and privacy concerns in these heterogeneous environments, while existing federated learning approaches suffer from high costs and poor generalization. To address these challenges, we propose HFL-FlowLLM, which to our knowledge is the first framework to apply large language models to network traffic flow classification in heterogeneous federated learning. Compared to state-of-the-art heterogeneous federated learning methods for network traffic flow classification, the proposed approach improves the average F1 score by approximately 13%, demonstrating compelling performance and strong robustness. When compared to existing large language models federated learning frameworks, as the number of clients participating in each training round increases, the proposed method achieves up to a 5% improvement in average F1 score while reducing the training costs by about 87%. These findings prove the potential and practical value of HFL-FlowLLM in modern communication networks security.

Paper Structure

This paper contains 13 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The workflow of HFL-FlowLLM during the HFL training.
  • Figure 2: Using malware traffic detection as an example, comparison between the original LLM head and network head.
  • Figure 3: Performance comparison between the LLM head and the network head on malware traffic detection.
  • Figure 4: Using malware traffic detection as an example, comparison of training parameter counts across frameworks.
  • Figure 5: PR of HFL-FlowLLM under different Dirichlet concentrations on ISCX BOTNET 2014, CSTNET 2023, and ISCX VPN 2016.
  • ...and 1 more figures