Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Rustem Islamov; Grigory Malinovsky; Alexander Gaponov; Aurelien Lucchi; Peter Richtárik; Eduard Gorbunov

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richtárik, Eduard Gorbunov

Abstract

Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Abstract

-smoothness and

-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.

Paper Structure (64 sections, 26 theorems, 258 equations, 4 figures, 2 tables, 5 algorithms)

This paper contains 64 sections, 26 theorems, 258 equations, 4 figures, 2 tables, 5 algorithms.

Introduction
Main Contributions.
Related Works
Error Feedback.
Byzantine Robust Optimization.
Differentially Private Optimization.
Differentially Private and Byzantine Robust Methods.
Preliminaries
Differential Privacy.
Robust Aggregation.
Assumptions.
Algorithm Design
Momentum Mechanism.
Clipping.
Error Feedback.
...and 49 more sections

Key Result

Theorem 5.1

Let Assumptions asmp:smoothness, asmp:stoch_grad, and asmp:bounded_heterogeneity hold, and $\alpha\in(0,1)$ be a failure probability. Let $\widetilde{B}_{\rm init} \coloneqq \max_{i\in\mathcal{G}}\{\|\nabla f_i(x^0)\|\} > 3\tau$, $\eta \sim \tau/\widetilde{B}_{\rm init}$, and $\Delta \ge \Phi^0$ for where and $\widetilde{\mathcal{O}}$ hides constant and logarithmic factors and higher order terms

Figures (4)

Figure 1: Performance of Byz-Clip21-SGD2M, Byz-Clip-SGD (\ref{['alg:byz_clip_sgd']}), and Safe-DSHB (\ref{['alg:safe_dshb']}) when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use IPM attack.
Figure 2: Performance of Byz-Clip21-SGD2M, Byz-Clip-SGD (\ref{['alg:byz_clip_sgd']}), and Safe-DSHB (\ref{['alg:safe_dshb']}) when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use label flipping attack.
Figure : Byz-Clip-SGD
Figure F.1: Performance of Byz-Clip21-SGD2M+, Byz-Clip-SGD, and Safe-DSHB when training CNN (top line) and MLP (bottom line) models on the MNIST dataset for different numbers of Byzantine clients and privacy budgets, when Byzantine clients use a label flipping attack, when amplification by sub-sampling is done.

Theorems & Definitions (45)

Definition 3.1: dwork2014algorithmic
Definition 3.2: allouah2023fixing
Remark 3.1
Theorem 5.1: Simplified
Remark 5.1
Corollary 5.1: Simplified
Corollary 5.2: Simplified
Corollary 5.3: Simplified
Theorem 5.2: Simplified
Lemma 1: Lemma 4.1 in khirirat2023clip21
...and 35 more

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Abstract

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (45)