High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Wenyu Liu; Tianqiang Huang; Pengfei Zhang; Zong Ke; Minghui Min; Puning Zhao

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Wenyu Liu, Tianqiang Huang, Pengfei Zhang, Zong Ke, Minghui Min, Puning Zhao

TL;DR

This work tackles Byzantine-robust distributed learning in high dimensions by introducing a direct high-dimensional semi-verified mean estimation method. The approach identifies a large-variance subspace and uses a small auxiliary clean dataset to estimate coordinates within that subspace, while leveraging corrupted gradient vectors for the orthogonal components; this yields minimax-optimal rates and removes the $\sqrt{d}$ scaling common in prior methods. The semi-verified estimator serves as a gradient aggregator, enabling robust distributed optimization under arbitrary numbers of Byzantine attackers; theoretical upper and lower bounds show dimension-free performance, and experiments on synthetic data and MNIST confirm substantial gains at high dimensionality. Collectively, the method provides a scalable, provably robust solution for federated learning settings with many untrusted workers and very large models.

Abstract

Adversarial attacks pose a major challenge to distributed learning systems, prompting the development of numerous robust learning methods. However, most existing approaches suffer from the curse of dimensionality, i.e. the error increases with the number of model parameters. In this paper, we make a progress towards high dimensional problems, under arbitrary number of Byzantine attackers. The cornerstone of our design is a direct high dimensional semi-verified mean estimation method. The idea is to identify a subspace with large variance. The components of the mean value perpendicular to this subspace are estimated using corrupted gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. As a result, a combination of large corrupted dataset and small clean dataset yields significantly better performance than using them separately. We then apply this method as the aggregator for distributed learning problems. The theoretical analysis shows that compared with existing solutions, our method gets rid of $\sqrt{d}$ dependence on the dimensionality, and achieves minimax optimal statistical rates. Numerical results validate our theory as well as the effectiveness of the proposed method.

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

TL;DR

scaling common in prior methods. The semi-verified estimator serves as a gradient aggregator, enabling robust distributed optimization under arbitrary numbers of Byzantine attackers; theoretical upper and lower bounds show dimension-free performance, and experiments on synthetic data and MNIST confirm substantial gains at high dimensionality. Collectively, the method provides a scalable, provably robust solution for federated learning settings with many untrusted workers and very large models.

Abstract

dependence on the dimensionality, and achieves minimax optimal statistical rates. Numerical results validate our theory as well as the effectiveness of the proposed method.

Paper Structure (28 sections, 17 theorems, 148 equations, 4 figures, 2 algorithms)

This paper contains 28 sections, 17 theorems, 148 equations, 4 figures, 2 algorithms.

Introduction
Preliminaries
Problem Statement of Byzantine Robust Distributed Learning
Problem Statement of Semi-verified Mean Estimation
Semi-verified Mean Estimation
Theoretical Analysis
Upper Bound
Minimax lower bound
Application in Distributed Learning under Byzantine Attack
Numerical Results
Synthesized Data
Real data
Conclusion
Acknowledgments
Appendix
...and 13 more sections

Key Result

Theorem 1

Under additive contamination model, if Assumption ass:var hold, and parameters $p$, $\lambda_c$ in Algorithm alg satisfy then in which $\delta_m$ decays faster than any polynomial of $m$.

Figures (4)

Figure 1: A two-dimensional illustration of the semi-verified mean estimation method shown in Algorithm \ref{['alg']}. $\mu^*$ is represented by the red triangle. Benign and attacked samples correspond to green and black dots, respectively. The orange plus sign denotes $\mathbf{X}_0$.
Figure 2: Experiment results with synthesized data under linear model, with $q/m=0.8$.
Figure 3: Experiment results with synthesized data under linear model, with $q/m=0.2$.
Figure 4: Experiment results with MNIST data.

Theorems & Definitions (34)

Definition 1
Definition 2
Definition 3
Theorem 1
Theorem 2
Remark 1
Theorem 3
Theorem 4
Lemma 1
Lemma 2
...and 24 more

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

TL;DR

Abstract

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (34)