Adaptive Gradient Clipping for Robust Federated Learning
Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan
TL;DR
The paper tackles robustness in distributed/federated learning under Byzantine workers by showing static gradient clipping is fragile across heterogeneity and attacks. It introduces Adaptive Robust Clipping (ARC), which dynamically determines clipping thresholds from input gradients while preserving the theoretical guarantees of Robust-DGD. The authors prove ARC maintains $(f,\kappa)$-robustness with an additive term, and demonstrate that, when initialization is well-chosen, ARC can improve asymptotic convergence, with empirical results on MNIST, Fashion-MNIST, and CIFAR-10 confirming significant robustness gains in highly heterogeneous and adversarial settings. The work highlights a meaningful gap between worst-case theory and practical performance, suggesting ARC as a reliable, tuning-free tool for robust distributed learning with practical impact in heterogeneous, Byzantine-prone environments.
Abstract
Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: enhancing robustness against some attacks while being ineffective or even detrimental against others. To address this limitation, we propose a principled adaptive clipping strategy, Adaptive Robust Clipping (ARC), which dynamically adjusts clipping thresholds based on the input gradients. We prove that ARC not only preserves the theoretical robustness guarantees of SOTA Robust-DGD methods but also provably improves asymptotic convergence when the model is well-initialized. Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings.
