Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection
William Marfo, Deepak Tosh, Shirley Moore, Joshua Suetterlein, Joseph Manzano
TL;DR
This work tackles the high communication overhead of federated learning in network anomaly detection by proposing an adaptive framework that tightly integrates dynamic batch sizing, asynchronous updates, and gradient-alignment-based client selection. By leveraging efficient local training with mixed precision and DDP, along with asynchronous aggregation and a Weibull-based checkpointing strategy, the approach dramatically reduces overhead while preserving detection accuracy on both general network traffic (UNSW-NB15) and automotive CAN data (ROAD). Key findings include a 97.6% reduction in end-to-end communication time and maintained accuracy around 95%, with statistical validation (p<0.05) demonstrating significance, and substantial profiling evidence showing reduced GPU operations and memory transfers. The proposed method demonstrates strong robustness to heterogeneity and dropout, scalable to large client counts, and offers practical impact for real-world edge deployments in diverse network security contexts.
Abstract
Communication overhead in federated learning (FL) poses a significant challenge for network anomaly detection systems, where diverse client configurations and network conditions impact efficiency and detection accuracy. Existing approaches attempt optimization individually but struggle to balance reduced overhead with performance. This paper presents an adaptive FL framework combining batch size optimization, client selection, and asynchronous updates for efficient anomaly detection. Using UNSW-NB15 for general network traffic and ROAD for automotive networks, our framework reduces communication overhead by 97.6% (700.0s to 16.8s) while maintaining comparable accuracy (95.10% vs. 95.12%). The Mann-Whitney U test confirms significant improvements (p < 0.05). Profiling analysis reveals efficiency gains via reduced GPU operations and memory transfers, ensuring robust detection across varying client conditions.
