Table of Contents
Fetching ...

On Robustness of Linear Classifiers to Targeted Data Poisoning

Nakshatra Gupta, Sumanth Prabhu, Supratik Chakraborty, R Venkatesh

TL;DR

This work addresses the problem of robustness of linear classifiers to targeted data poisoning via label perturbations in a black-box setting. It proves NP-Completeness for deciding robustness and introduces two scalable bound techniques: a partition-based MILP lower bound and an augmentation-based upper bound implemented in the RobustRange tool. Through experiments on 15 public datasets, the approach yields tight robustness intervals and demonstrates that poisoning guided by the upper bound can effectively flip targeted test points, while outperforming a state-of-the-art white-box method in several scenarios. The methods provide practical, pointwise insight into dataset trustworthiness and offer a foundation for contesting model decisions when training data integrity is in question.

Abstract

Data poisoning is a training-time attack that undermines the trustworthiness of learned models. In a targeted data poisoning attack, an adversary manipulates the training dataset to alter the classification of a targeted test point. Given the typically large size of training dataset, manual detection of poisoning is difficult. An alternative is to automatically measure a dataset's robustness against such an attack, which is the focus of this paper. We consider a threat model wherein an adversary can only perturb the labels of the training dataset, with knowledge limited to the hypothesis space of the victim's model. In this setting, we prove that finding the robustness is an NP-Complete problem, even when hypotheses are linear classifiers. To overcome this, we present a technique that finds lower and upper bounds of robustness. Our implementation of the technique computes these bounds efficiently in practice for many publicly available datasets. We experimentally demonstrate the effectiveness of our approach. Specifically, a poisoning exceeding the identified robustness bounds significantly impacts test point classification. We are also able to compute these bounds in many more cases where state-of-the-art techniques fail.

On Robustness of Linear Classifiers to Targeted Data Poisoning

TL;DR

This work addresses the problem of robustness of linear classifiers to targeted data poisoning via label perturbations in a black-box setting. It proves NP-Completeness for deciding robustness and introduces two scalable bound techniques: a partition-based MILP lower bound and an augmentation-based upper bound implemented in the RobustRange tool. Through experiments on 15 public datasets, the approach yields tight robustness intervals and demonstrates that poisoning guided by the upper bound can effectively flip targeted test points, while outperforming a state-of-the-art white-box method in several scenarios. The methods provide practical, pointwise insight into dataset trustworthiness and offer a foundation for contesting model decisions when training data integrity is in question.

Abstract

Data poisoning is a training-time attack that undermines the trustworthiness of learned models. In a targeted data poisoning attack, an adversary manipulates the training dataset to alter the classification of a targeted test point. Given the typically large size of training dataset, manual detection of poisoning is difficult. An alternative is to automatically measure a dataset's robustness against such an attack, which is the focus of this paper. We consider a threat model wherein an adversary can only perturb the labels of the training dataset, with knowledge limited to the hypothesis space of the victim's model. In this setting, we prove that finding the robustness is an NP-Complete problem, even when hypotheses are linear classifiers. To overcome this, we present a technique that finds lower and upper bounds of robustness. Our implementation of the technique computes these bounds efficiently in practice for many publicly available datasets. We experimentally demonstrate the effectiveness of our approach. Specifically, a poisoning exceeding the identified robustness bounds significantly impacts test point classification. We are also able to compute these bounds in many more cases where state-of-the-art techniques fail.

Paper Structure

This paper contains 26 sections, 5 theorems, 3 equations, 55 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Given a dataset $\mathcal{D}_c = \{(x_i, y_i) \mid i \in [m]\}$ and a target test point $(x_t, y_t)$, deciding whether its robustness $\mathit{r}$ is less than a threshold $\kappa$ is NP-Complete, when $\mathcal{H}$ is a set of linear binary classifiers.

Figures (55)

  • Figure 1: A test point (leftmost plot) and two training points all originally labeled as digit '0'. Our tools finds that modifying only these two training points as digit '4' can change the test point's label to digit '4'.
  • Figure 2: Average time taken in seconds per test point by RobustRange ($\hat{\mathit{r}}$, $\check{\mathit{r}}$) and IP-relabel ($\hat{\mathit{r}}_{IPr\xspace}$).
  • Figure 3: Histograms of the distribution of points across $\hat{\mathit{r}}$ values for the Census Income dataset.
  • Figure 4: Comparison of average accuracy vs $\rho$ for Census Income dataset with $\{0, \frac{\hat{\mathit{r}}}{4}, \frac{\hat{\mathit{r}}}{2}, \hat{\mathit{r}}, 2\hat{\mathit{r}}, 4\hat{\mathit{r}}\}$ and loss functions.
  • Figure 5: Histograms of the distribution of points across $\check{\mathit{r}}$ values for the Census Income dataset.
  • ...and 50 more figures

Theorems & Definitions (8)

  • Theorem 1
  • proof : Proof Sketch
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • Theorem 5
  • proof