Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings
Carolin Heinzler
TL;DR
Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings studies how to obtain stochastic-like performance guarantees for sequential classifiers when an unknown number of clean-label adversarial samples may be present, with the ability to abstain on uncertain data. The authors build on Goel et al. by employing disagreement-based abstaining learners and extend the analysis to agnostic/noisy settings, introducing a clean-label adversary under label noise and deriving threshold-based disagreement learners that achieve finite VC-dimension guarantees rather than Littlestone-based bounds. They correct inaccuracies in the original argument and develop structure-based approaches for VC dimension 1 and axis-aligned rectangles, then discuss extensions to agnostic learning with known distributions. Overall, the work advances robust sequential learning under adversarial data, highlighting abstention as a critical component for resilience and outlining concrete avenues for broader agnostic generalization and more general hypothesis classes.
Abstract
We investigate the challenge of establishing stochastic-like guarantees when sequentially learning from a stream of i.i.d. data that includes an unknown quantity of clean-label adversarial samples. We permit the learner to abstain from making predictions when uncertain. The regret of the learner is measured in terms of misclassification and abstention error, where we allow the learner to abstain for free on adversarial injected samples. This approach is based on the work of Goel, Hanneke, Moran, and Shetty from arXiv:2306.13119. We explore the methods they present and manage to correct inaccuracies in their argumentation. However, this approach is limited to the realizable setting, where labels are assigned according to some function $f^*$ from the hypothesis space $\mathcal{F}$. Based on similar arguments, we explore methods to make adaptations for the agnostic setting where labels are random. Introducing the notion of a clean-label adversary in the agnostic context, we are the first to give a theoretical analysis of a disagreement-based learner for thresholds, subject to a clean-label adversary with noise.
