Table of Contents
Fetching ...

On the Relevance of Byzantine Robust Optimization Against Data Poisoning

Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot

TL;DR

This work analyzes robustness in distributed ML under data poisoning and Byzantine faults. It shows that Byzantine-robust optimization is tight against data poisoning for a broad PL/smooth setting with gradient heterogeneity, deriving matching lower and upper bounds using a robust DSGD variant with Polyak momentum and trimmed mean. The results cover fully-poisonous and partially-poisonous local data, revealing that fully-poisonous data is typically the stronger adversary when data are heterogeneous, and establish near-optimal iteration complexities. The findings guide design of practical robust distributed learning systems by demonstrating that defenses against Byzantine faults already achieve optimal guarantees under realistic data-corruption threats, with implications for privacy and reliability in federated learning. The work also raises open questions about targeted attacks and tighter condition-number dependence, inviting further exploration beyond the PL framework.

Abstract

The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data.

On the Relevance of Byzantine Robust Optimization Against Data Poisoning

TL;DR

This work analyzes robustness in distributed ML under data poisoning and Byzantine faults. It shows that Byzantine-robust optimization is tight against data poisoning for a broad PL/smooth setting with gradient heterogeneity, deriving matching lower and upper bounds using a robust DSGD variant with Polyak momentum and trimmed mean. The results cover fully-poisonous and partially-poisonous local data, revealing that fully-poisonous data is typically the stronger adversary when data are heterogeneous, and establish near-optimal iteration complexities. The findings guide design of practical robust distributed learning systems by demonstrating that defenses against Byzantine faults already achieve optimal guarantees under realistic data-corruption threats, with implications for privacy and reliability in federated learning. The work also raises open questions about targeted attacks and tighter condition-number dependence, inviting further exploration beyond the PL framework.

Abstract

The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data.
Paper Structure (33 sections, 16 theorems, 178 equations)

This paper contains 33 sections, 16 theorems, 178 equations.

Key Result

Theorem 1

Suppose assumptions asp:lip, asp:polyak, asp:bnd_var, and asp:heter hold true. Let $Q_0 := Q^{(\mathcal{H})} \left({{\theta}^{}_{}} \right)- Q^*$. Consider algorithm $\Pi$ as described above. If there exists $A \geq 0$ such that $\mathbb{E}_{\Pi}\left[{Q^{(\mathcal{H})} \left( \hat{{\theta}^{}_{}} \ where $\mathbb{E}_{\Pi}\left[{\cdot}\right]$ denotes the expectation over the randomness in $\Pi$.

Theorems & Definitions (31)

  • Theorem 1
  • proof : Proof sketch
  • Theorem 2
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Remark 1
  • ...and 21 more