Table of Contents
Fetching ...

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Wenyu Liu, Tianqiang Huang, Pengfei Zhang, Zong Ke, Minghui Min, Puning Zhao

TL;DR

This work tackles Byzantine-robust distributed learning in high dimensions by introducing a direct high-dimensional semi-verified mean estimation method. The approach identifies a large-variance subspace and uses a small auxiliary clean dataset to estimate coordinates within that subspace, while leveraging corrupted gradient vectors for the orthogonal components; this yields minimax-optimal rates and removes the $\sqrt{d}$ scaling common in prior methods. The semi-verified estimator serves as a gradient aggregator, enabling robust distributed optimization under arbitrary numbers of Byzantine attackers; theoretical upper and lower bounds show dimension-free performance, and experiments on synthetic data and MNIST confirm substantial gains at high dimensionality. Collectively, the method provides a scalable, provably robust solution for federated learning settings with many untrusted workers and very large models.

Abstract

Adversarial attacks pose a major challenge to distributed learning systems, prompting the development of numerous robust learning methods. However, most existing approaches suffer from the curse of dimensionality, i.e. the error increases with the number of model parameters. In this paper, we make a progress towards high dimensional problems, under arbitrary number of Byzantine attackers. The cornerstone of our design is a direct high dimensional semi-verified mean estimation method. The idea is to identify a subspace with large variance. The components of the mean value perpendicular to this subspace are estimated using corrupted gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. As a result, a combination of large corrupted dataset and small clean dataset yields significantly better performance than using them separately. We then apply this method as the aggregator for distributed learning problems. The theoretical analysis shows that compared with existing solutions, our method gets rid of $\sqrt{d}$ dependence on the dimensionality, and achieves minimax optimal statistical rates. Numerical results validate our theory as well as the effectiveness of the proposed method.

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

TL;DR

This work tackles Byzantine-robust distributed learning in high dimensions by introducing a direct high-dimensional semi-verified mean estimation method. The approach identifies a large-variance subspace and uses a small auxiliary clean dataset to estimate coordinates within that subspace, while leveraging corrupted gradient vectors for the orthogonal components; this yields minimax-optimal rates and removes the scaling common in prior methods. The semi-verified estimator serves as a gradient aggregator, enabling robust distributed optimization under arbitrary numbers of Byzantine attackers; theoretical upper and lower bounds show dimension-free performance, and experiments on synthetic data and MNIST confirm substantial gains at high dimensionality. Collectively, the method provides a scalable, provably robust solution for federated learning settings with many untrusted workers and very large models.

Abstract

Adversarial attacks pose a major challenge to distributed learning systems, prompting the development of numerous robust learning methods. However, most existing approaches suffer from the curse of dimensionality, i.e. the error increases with the number of model parameters. In this paper, we make a progress towards high dimensional problems, under arbitrary number of Byzantine attackers. The cornerstone of our design is a direct high dimensional semi-verified mean estimation method. The idea is to identify a subspace with large variance. The components of the mean value perpendicular to this subspace are estimated using corrupted gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. As a result, a combination of large corrupted dataset and small clean dataset yields significantly better performance than using them separately. We then apply this method as the aggregator for distributed learning problems. The theoretical analysis shows that compared with existing solutions, our method gets rid of dependence on the dimensionality, and achieves minimax optimal statistical rates. Numerical results validate our theory as well as the effectiveness of the proposed method.
Paper Structure (28 sections, 17 theorems, 148 equations, 4 figures, 2 algorithms)

This paper contains 28 sections, 17 theorems, 148 equations, 4 figures, 2 algorithms.

Key Result

Theorem 1

Under additive contamination model, if Assumption ass:var hold, and parameters $p$, $\lambda_c$ in Algorithm alg satisfy then in which $\delta_m$ decays faster than any polynomial of $m$.

Figures (4)

  • Figure 1: A two-dimensional illustration of the semi-verified mean estimation method shown in Algorithm \ref{['alg']}. $\mu^*$ is represented by the red triangle. Benign and attacked samples correspond to green and black dots, respectively. The orange plus sign denotes $\mathbf{X}_0$.
  • Figure 2: Experiment results with synthesized data under linear model, with $q/m=0.8$.
  • Figure 3: Experiment results with synthesized data under linear model, with $q/m=0.2$.
  • Figure 4: Experiment results with MNIST data.

Theorems & Definitions (34)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Lemma 1
  • Lemma 2
  • ...and 24 more