Table of Contents
Fetching ...

Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings

Carolin Heinzler

TL;DR

Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings studies how to obtain stochastic-like performance guarantees for sequential classifiers when an unknown number of clean-label adversarial samples may be present, with the ability to abstain on uncertain data. The authors build on Goel et al. by employing disagreement-based abstaining learners and extend the analysis to agnostic/noisy settings, introducing a clean-label adversary under label noise and deriving threshold-based disagreement learners that achieve finite VC-dimension guarantees rather than Littlestone-based bounds. They correct inaccuracies in the original argument and develop structure-based approaches for VC dimension 1 and axis-aligned rectangles, then discuss extensions to agnostic learning with known distributions. Overall, the work advances robust sequential learning under adversarial data, highlighting abstention as a critical component for resilience and outlining concrete avenues for broader agnostic generalization and more general hypothesis classes.

Abstract

We investigate the challenge of establishing stochastic-like guarantees when sequentially learning from a stream of i.i.d. data that includes an unknown quantity of clean-label adversarial samples. We permit the learner to abstain from making predictions when uncertain. The regret of the learner is measured in terms of misclassification and abstention error, where we allow the learner to abstain for free on adversarial injected samples. This approach is based on the work of Goel, Hanneke, Moran, and Shetty from arXiv:2306.13119. We explore the methods they present and manage to correct inaccuracies in their argumentation. However, this approach is limited to the realizable setting, where labels are assigned according to some function $f^*$ from the hypothesis space $\mathcal{F}$. Based on similar arguments, we explore methods to make adaptations for the agnostic setting where labels are random. Introducing the notion of a clean-label adversary in the agnostic context, we are the first to give a theoretical analysis of a disagreement-based learner for thresholds, subject to a clean-label adversary with noise.

Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings

TL;DR

Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings studies how to obtain stochastic-like performance guarantees for sequential classifiers when an unknown number of clean-label adversarial samples may be present, with the ability to abstain on uncertain data. The authors build on Goel et al. by employing disagreement-based abstaining learners and extend the analysis to agnostic/noisy settings, introducing a clean-label adversary under label noise and deriving threshold-based disagreement learners that achieve finite VC-dimension guarantees rather than Littlestone-based bounds. They correct inaccuracies in the original argument and develop structure-based approaches for VC dimension 1 and axis-aligned rectangles, then discuss extensions to agnostic learning with known distributions. Overall, the work advances robust sequential learning under adversarial data, highlighting abstention as a critical component for resilience and outlining concrete avenues for broader agnostic generalization and more general hypothesis classes.

Abstract

We investigate the challenge of establishing stochastic-like guarantees when sequentially learning from a stream of i.i.d. data that includes an unknown quantity of clean-label adversarial samples. We permit the learner to abstain from making predictions when uncertain. The regret of the learner is measured in terms of misclassification and abstention error, where we allow the learner to abstain for free on adversarial injected samples. This approach is based on the work of Goel, Hanneke, Moran, and Shetty from arXiv:2306.13119. We explore the methods they present and manage to correct inaccuracies in their argumentation. However, this approach is limited to the realizable setting, where labels are assigned according to some function from the hypothesis space . Based on similar arguments, we explore methods to make adaptations for the agnostic setting where labels are random. Introducing the notion of a clean-label adversary in the agnostic context, we are the first to give a theoretical analysis of a disagreement-based learner for thresholds, subject to a clean-label adversary with noise.

Paper Structure

This paper contains 29 sections, 21 theorems, 73 equations, 3 figures, 7 algorithms.

Key Result

Theorem 2.7

Let $\mathcal{F}$ be a binary hypothesis class over $\mathcal{X}$. Then the following are equivalent: Furthermore, if $\mathcal{F}$ is PAC-learnable, we have for the sample complexity there exist absolute constants $C_1,C_2$ such that:

Figures (3)

  • Figure 2. 1: Examples of different function classes on $\mathcal{X}$
  • Figure 2. 2: An example of a classification problem for axis-aligned rectangles. Given a set of 10 points which are already assigned a positive or negative label a new point is received by the algorithm (see plot 1). The algorithm decides to predict (see plot 2 and 3) - as there are 4 points which convince us to predict 0 (take $\alpha =\sqrt{11/\log(11)}<3$). This would be a misclassification, as the new point is actually a postive point. See plot 4 for the updated disagreement region and rectangle containing the positive points.
  • Figure 3. 3: An example of how the target threshold disagrees with 'bad' hypothesis on more than $M/2$ points, where $M=9$. From the examples of the first row, we see that if the target $f^*$ has its threshold within the first interval $[a_i,c_i^-]$, respectively the last $[c_i^+,b_i]$, then functions $f\in\mathcal{F}_{i-1}$ with their threshold in the last interval $[c_i^+,b_i]$, respectively the first $[a_i,c_i^-]$, disagree with $f^*$ on all$M=9$ points in $[c_i^-,c_i^+]$, i.e. $f(\hat{x}_t)\neq f(\hat{x}_t)$. In the second row we see an example, when the threshold of the target $f^*$ falls within the interval $[c_i^-,c_i^+]$. In this case, on at least one side of the threshold, there are than $M/2$ samples. For this example, we have $5>M/2$ samples right of the threshold. Thus, the goal is that all hypothesis with thresholds in the shaded area on the right, should be eliminated with high probability, as they would classify these $5$ points incorrectly (compared to the target). Note that 2 points are noisy (opposite label of $f^*$) in every example.

Theorems & Definitions (61)

  • Definition 2.1: Shattering and VC dimension vladimirvapnik1982EstimationDependencesBasedvapnik1971UniformConvergenceRelative
  • Definition 2.2: Littlestone dimension littlestone1988LearningQuicklyWhen
  • Definition 2.3: Set of shattered $k$-sets
  • Definition 2.4: Disagreement region
  • Definition 2.5: Learner or Learning Algorithm
  • Definition 2.6: PAC learnable (from shalev-shwartz2014UnderstandingMachineLearning)
  • Theorem 2.7: Fundamental Theorem of Statistical Learning Part I
  • Definition 2.8: Online learnability (see shalev-shwartz2014UnderstandingMachineLearning
  • Remark 2.9
  • Definition 2.10: Universal learnability bousquet2021TheoryUniversalLearning
  • ...and 51 more