Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation
Kaoru Otsuka, Yuki Takezawa, Makoto Yamada
TL;DR
The paper addresses the challenge of Byzantine-robust federated learning under partial participation by introducing Delayed Momentum Aggregation (DeMoA). The core idea is to aggregate both fresh momentum from sampled clients and cached momentum from non-sampled clients, preserving a minority of Byzantine influence in every round. The authors provide a convergence guarantee under standard assumptions, show that DeMoA remains robust even with partial participation, and demonstrate superior performance against multiple Byzantine attacks on image datasets. The results highlight DeMoA’s practical impact for scalable, robust FL in real-world networks with intermittent client participation. Under overparameterization, the method achieves even stronger convergence, effectively mitigating the non-vanishing error terms caused by Byzantine behavior and data heterogeneity.
Abstract
Partial participation is essential for communication-efficient federated learning at scale, yet existing Byzantine-robust methods typically assume full client participation. In the partial participation setting, a majority of the sampled clients may be Byzantine, once Byzantine clients dominate, existing methods break down immediately. We introduce delayed momentum aggregation, a principle where the central server aggregates cached momentum from non-sampled clients along with fresh momentum from sampled clients. This principle ensures Byzantine clients remain a minority from the server's perspective even when they dominate the sampled set. We instantiate this principle in our optimizer DeMoA. We analyze the convergence rate of DeMoA, showing that DeMoA is Byzantine-robust under partial participation. Experiments show that, with 20% Byzantine ratio and only 10% partial participation rate, DeMoA achieves the best accuracy even when existing methods fail empirically.
