Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Ambar Pal; René Vidal; Jeremias Sulam

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Ambar Pal, René Vidal, Jeremias Sulam

TL;DR

This work links the existence of robust classifiers under sparse ($\ell_0$) adversarial perturbations to a localization property of the data distribution: if robustness exists, class-conditionals must localize on small-volume regions, and strong localization with separation guarantees suffices to construct a robust classifier. Building on this theory, the authors introduce Box-NN, a nearest-box classifier whose decision regions are axis-aligned boxes, and provide a certifiable $\ell_0$ robustness guarantee via a margin based on distances to boxes. They develop an optimization framework to learn boxes from data, employing soft-min relaxations and initialization tricks to optimize a robustness-aware objective. Empirical results on MNIST and Fashion-MNIST show Box-NN yields state-of-the-art certified robustness against sparse attacks, often outperforming existing ensembling or randomized-smoothing baselines across a broad range of perturbation budgets. The work highlights that exploiting the data geometry through localized, box-shaped decision regions can yield lighter, more effective certifiable defenses against $\ell_0$ perturbations, while noting limitations in scalability to more complex datasets and the potential for richer decision boundaries in future work.

Abstract

Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to $\ell_0$-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of $\ell_0$-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

TL;DR

This work links the existence of robust classifiers under sparse (

) adversarial perturbations to a localization property of the data distribution: if robustness exists, class-conditionals must localize on small-volume regions, and strong localization with separation guarantees suffices to construct a robust classifier. Building on this theory, the authors introduce Box-NN, a nearest-box classifier whose decision regions are axis-aligned boxes, and provide a certifiable

robustness guarantee via a margin based on distances to boxes. They develop an optimization framework to learn boxes from data, employing soft-min relaxations and initialization tricks to optimize a robustness-aware objective. Empirical results on MNIST and Fashion-MNIST show Box-NN yields state-of-the-art certified robustness against sparse attacks, often outperforming existing ensembling or randomized-smoothing baselines across a broad range of perturbation budgets. The work highlights that exploiting the data geometry through localized, box-shaped decision regions can yield lighter, more effective certifiable defenses against

perturbations, while noting limitations in scalability to more complex datasets and the potential for richer decision boundaries in future work.

Abstract

-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to

-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of

-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.

Paper Structure (15 sections, 8 theorems, 51 equations, 4 figures, 1 table)

This paper contains 15 sections, 8 theorems, 51 equations, 4 figures, 1 table.

Introduction
Existence of an $\ell_0$-Robust Classifier implies Localization
Discussion on \ref{['th:lzeroconc']}
$d$-Strong Localization implies Existence of a $d$-Robust Classifier
Implications for Existing Impossibility Results
$\ell_0$-Adversarially Robust Classification via the Box-NN classifier
Development and Robustness Certification
Key Intuition
Learning Box-NN from Data
Relaxing Indicator Functions
Improving Initialization
Empirical Evaluation
Conclusion, Limitations and Future Work
Auxilliary Lemmas and Proofs
Additional Empirical Comparison

Key Result

Theorem 2.2

If there exists an $(\epsilon, \delta)$-robust classifier $f$ with respect to the $\ell_0$ distance for a data distribution $p$, then at least one of the class conditionals $q_1, q_2, \ldots, q_K$ must be $(C, \epsilon^2 / n, \delta)$--localized according to l0concdefn. Further, if the classes are b

Figures (4)

Figure 1: $S_1$ is the green shaded region around $x_{dog}$, where the class dog is localized, and $S_2$ is the orange shaded region around $x_{cat}$, where the class cat is localized.
Figure 2: Comparison of Randomized Ablation levine2020robustness to our method Box-NN on the MNIST (left) and FashionMNIST (right) datasets. In each figure, the dotted lines correspond to different hyperparameter settings $\rho$. Details in text.
Figure 3: Comparison of jia2022almost (left) and hammoudeh2023feature (right) to our method Box-NN on the MNIST dataset. The dotted lines correspond to different settings for the hyperparameter $\rho$. Details are mentioned in text.
Figure 4: Comparison of a deterministic certificate hammoudeh2023feature (dotted lines) to our method Box-NN (red line) on the MNIST dataset. The dotted lines correspond to different settings for the hyperparameter $\rho$. Details are mentioned in main text.

Theorems & Definitions (16)

Definition 2.1: Localized Distribution, modification of pal2023concentration
Theorem 2.2
proof
Lemma 2.3: Proposition 2.1.1 in talagrand1995concentration
Definition 3.1: $d$-Strongly Localized Distributions, generalizing pal2023concentration
Theorem 3.2
proof
Example 3.1
Lemma 4.0: $\ell_0$ distance to axis-aligned boxes
Theorem 4.1: Robustness Certificate for Box-NN
...and 6 more

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

TL;DR

Abstract

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (16)