Table of Contents
Fetching ...

Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection

William Marfo, Deepak Tosh, Shirley Moore, Joshua Suetterlein, Joseph Manzano

TL;DR

This work tackles the high communication overhead of federated learning in network anomaly detection by proposing an adaptive framework that tightly integrates dynamic batch sizing, asynchronous updates, and gradient-alignment-based client selection. By leveraging efficient local training with mixed precision and DDP, along with asynchronous aggregation and a Weibull-based checkpointing strategy, the approach dramatically reduces overhead while preserving detection accuracy on both general network traffic (UNSW-NB15) and automotive CAN data (ROAD). Key findings include a 97.6% reduction in end-to-end communication time and maintained accuracy around 95%, with statistical validation (p<0.05) demonstrating significance, and substantial profiling evidence showing reduced GPU operations and memory transfers. The proposed method demonstrates strong robustness to heterogeneity and dropout, scalable to large client counts, and offers practical impact for real-world edge deployments in diverse network security contexts.

Abstract

Communication overhead in federated learning (FL) poses a significant challenge for network anomaly detection systems, where diverse client configurations and network conditions impact efficiency and detection accuracy. Existing approaches attempt optimization individually but struggle to balance reduced overhead with performance. This paper presents an adaptive FL framework combining batch size optimization, client selection, and asynchronous updates for efficient anomaly detection. Using UNSW-NB15 for general network traffic and ROAD for automotive networks, our framework reduces communication overhead by 97.6% (700.0s to 16.8s) while maintaining comparable accuracy (95.10% vs. 95.12%). The Mann-Whitney U test confirms significant improvements (p < 0.05). Profiling analysis reveals efficiency gains via reduced GPU operations and memory transfers, ensuring robust detection across varying client conditions.

Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection

TL;DR

This work tackles the high communication overhead of federated learning in network anomaly detection by proposing an adaptive framework that tightly integrates dynamic batch sizing, asynchronous updates, and gradient-alignment-based client selection. By leveraging efficient local training with mixed precision and DDP, along with asynchronous aggregation and a Weibull-based checkpointing strategy, the approach dramatically reduces overhead while preserving detection accuracy on both general network traffic (UNSW-NB15) and automotive CAN data (ROAD). Key findings include a 97.6% reduction in end-to-end communication time and maintained accuracy around 95%, with statistical validation (p<0.05) demonstrating significance, and substantial profiling evidence showing reduced GPU operations and memory transfers. The proposed method demonstrates strong robustness to heterogeneity and dropout, scalable to large client counts, and offers practical impact for real-world edge deployments in diverse network security contexts.

Abstract

Communication overhead in federated learning (FL) poses a significant challenge for network anomaly detection systems, where diverse client configurations and network conditions impact efficiency and detection accuracy. Existing approaches attempt optimization individually but struggle to balance reduced overhead with performance. This paper presents an adaptive FL framework combining batch size optimization, client selection, and asynchronous updates for efficient anomaly detection. Using UNSW-NB15 for general network traffic and ROAD for automotive networks, our framework reduces communication overhead by 97.6% (700.0s to 16.8s) while maintaining comparable accuracy (95.10% vs. 95.12%). The Mann-Whitney U test confirms significant improvements (p < 0.05). Profiling analysis reveals efficiency gains via reduced GPU operations and memory transfers, ensuring robust detection across varying client conditions.

Paper Structure

This paper contains 21 sections, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Framework architecture integrating efficient client training, asynchronous communication, and selective updates.
  • Figure 2: Impact of synchronous (left) versus asynchronous (right) communication in FL. In the synchronous approach, the red “X” indicates the synchronization barrier, where the server waits for all clients to finish training before aggregation. Asynchronous updates allow continuous model improvement through independent client contributions without such a barrier.
  • Figure 3: Communication patterns: (left) update frequency per round; (right) time scaling with increasing clients.
  • Figure 4: Fault tolerance performance evaluation across increasing dropout rates. Results averaged over 100 experimental runs.