Table of Contents
Fetching ...

Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

Lihan Xu, Yanjie Dong, Gang Wang, Runhao Zeng, Xiaoyi Fan, Xiping Hu

TL;DR

The paper addresses robust federated learning in the presence of Byzantine adversaries by introducing Byrd-NAFL, which blends Nesterov momentum with Byzantine-resilient aggregation to accelerate and safeguard convergence. It provides a finite-time convergence guarantee for smooth non-convex losses under a soft Byzantine resilience assumption and analyzes how adversarial perturbations, momentum, and stochastic noise affect learning. The authors demonstrate that Byrd-NAFL outperforms baselines on COVTYPE and MNIST across multiple attack types, achieving faster convergence and higher accuracy while maintaining resilience. This approach offers a practical pathway to reliable, communication-efficient FL in adversarial environments, with the ability to leverage momentum without sacrificing robustness.

Abstract

We investigate robust federated learning, where a group of workers collaboratively train a shared model under the orchestration of a central server in the presence of Byzantine adversaries capable of arbitrary and potentially malicious behaviors. To simultaneously enhance communication efficiency and robustness against such adversaries, we propose a Byzantine-resilient Nesterov-Accelerated Federated Learning (Byrd-NAFL) algorithm. Byrd-NAFL seamlessly integrates Nesterov's momentum into the federated learning process alongside Byzantine-resilient aggregation rules to achieve fast and safeguarding convergence against gradient corruption. We establish a finite-time convergence guarantee for Byrd-NAFL under non-convex and smooth loss functions with relaxed assumption on the aggregated gradients. Extensive numerical experiments validate the effectiveness of Byrd-NAFL and demonstrate the superiority over existing benchmarks in terms of convergence speed, accuracy, and resilience to diverse Byzantine attack strategies.

Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

TL;DR

The paper addresses robust federated learning in the presence of Byzantine adversaries by introducing Byrd-NAFL, which blends Nesterov momentum with Byzantine-resilient aggregation to accelerate and safeguard convergence. It provides a finite-time convergence guarantee for smooth non-convex losses under a soft Byzantine resilience assumption and analyzes how adversarial perturbations, momentum, and stochastic noise affect learning. The authors demonstrate that Byrd-NAFL outperforms baselines on COVTYPE and MNIST across multiple attack types, achieving faster convergence and higher accuracy while maintaining resilience. This approach offers a practical pathway to reliable, communication-efficient FL in adversarial environments, with the ability to leverage momentum without sacrificing robustness.

Abstract

We investigate robust federated learning, where a group of workers collaboratively train a shared model under the orchestration of a central server in the presence of Byzantine adversaries capable of arbitrary and potentially malicious behaviors. To simultaneously enhance communication efficiency and robustness against such adversaries, we propose a Byzantine-resilient Nesterov-Accelerated Federated Learning (Byrd-NAFL) algorithm. Byrd-NAFL seamlessly integrates Nesterov's momentum into the federated learning process alongside Byzantine-resilient aggregation rules to achieve fast and safeguarding convergence against gradient corruption. We establish a finite-time convergence guarantee for Byrd-NAFL under non-convex and smooth loss functions with relaxed assumption on the aggregated gradients. Extensive numerical experiments validate the effectiveness of Byrd-NAFL and demonstrate the superiority over existing benchmarks in terms of convergence speed, accuracy, and resilience to diverse Byzantine attack strategies.

Paper Structure

This paper contains 44 sections, 1 theorem, 37 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions as:01 and as:02 are satisfied. When the learning rate $\eta \le (1-\sin\gamma)(1-\beta)^3/c_1 L [L\beta^4 + (1-\beta)^2]$, the convergence rate of Byrd-NAFL algorithm is

Figures (3)

  • Figure 1: An illustration of Byzantine-resilient aggregation.
  • Figure 2: Accuracy comparison on COVTYPE under four attacks with different Byzantine ratios.
  • Figure 3: Accuracy comparison on MNIST under four attacks with different Byzantine ratios.

Theorems & Definitions (2)

  • Remark 1
  • Theorem 1