Table of Contents
Fetching ...

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richtárik, Eduard Gorbunov

Abstract

Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Abstract

Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard -smoothness and -sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.
Paper Structure (64 sections, 26 theorems, 258 equations, 4 figures, 2 tables, 5 algorithms)

This paper contains 64 sections, 26 theorems, 258 equations, 4 figures, 2 tables, 5 algorithms.

Key Result

Theorem 5.1

Let Assumptions asmp:smoothness, asmp:stoch_grad, and asmp:bounded_heterogeneity hold, and $\alpha\in(0,1)$ be a failure probability. Let $\widetilde{B}_{\rm init} \coloneqq \max_{i\in\mathcal{G}}\{\|\nabla f_i(x^0)\|\} > 3\tau$, $\eta \sim \tau/\widetilde{B}_{\rm init}$, and $\Delta \ge \Phi^0$ for where and $\widetilde{\mathcal{O}}$ hides constant and logarithmic factors and higher order terms

Figures (4)

  • Figure 1: Performance of Byz-Clip21-SGD2M, Byz-Clip-SGD (\ref{['alg:byz_clip_sgd']}), and Safe-DSHB (\ref{['alg:safe_dshb']}) when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use IPM attack.
  • Figure 2: Performance of Byz-Clip21-SGD2M, Byz-Clip-SGD (\ref{['alg:byz_clip_sgd']}), and Safe-DSHB (\ref{['alg:safe_dshb']}) when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use label flipping attack.
  • Figure : Byz-Clip-SGD
  • Figure F.1: Performance of Byz-Clip21-SGD2M+, Byz-Clip-SGD, and Safe-DSHB when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use a label flipping attack, when amplification by sub-sampling is done.

Theorems & Definitions (45)

  • Definition 3.1: dwork2014algorithmic
  • Definition 3.2: allouah2023fixing
  • Remark 3.1
  • Theorem 5.1: Simplified
  • Remark 5.1
  • Corollary 5.1: Simplified
  • Corollary 5.2: Simplified
  • Corollary 5.3: Simplified
  • Theorem 5.2: Simplified
  • Lemma 1: Lemma 4.1 in khirirat2023clip21
  • ...and 35 more