Table of Contents
Fetching ...

Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality

Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, Rene Vidal

TL;DR

This work introduces Projected Displacement (PD), a task-driven adversarial threat model that is anisotropic and local by aligning perturbations with unsafe directions derived from the true labeling function. PD yields convex sublevel sets and supports efficient projection, enabling principled robust evaluation and easy integration with existing attacks. By incorporating observed data, segmentation masks (PD-S), and class hierarchies (PD-W), PD delivers a flexible, task-aware assessment of safe versus unsafe perturbations and demonstrates improved robustness metrics on Imagenet-1k relative to standard $\ell_p$ threats. The approach offers a practical framework for realism-aware robustness, combining theoretical rigor with scalable, data-driven approximations and task annotations that align with real-world vision tasks.

Abstract

State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $\ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large $\ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.

Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality

TL;DR

This work introduces Projected Displacement (PD), a task-driven adversarial threat model that is anisotropic and local by aligning perturbations with unsafe directions derived from the true labeling function. PD yields convex sublevel sets and supports efficient projection, enabling principled robust evaluation and easy integration with existing attacks. By incorporating observed data, segmentation masks (PD-S), and class hierarchies (PD-W), PD delivers a flexible, task-aware assessment of safe versus unsafe perturbations and demonstrates improved robustness metrics on Imagenet-1k relative to standard threats. The approach offers a practical framework for realism-aware robustness, combining theoretical rigor with scalable, data-driven approximations and task annotations that align with real-world vision tasks.

Abstract

State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.

Paper Structure

This paper contains 30 sections, 1 theorem, 22 equations, 24 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

The true labeling function $h^\star$ is $1$-robust at any input $\mathbf{x}\in \mathcal{X}$ w.r.t. the threat $d^\star_{\mathrm{PD}}$. Additionally, if a classifier $h$ is not 1-robust at all inputs then there exists an input $\mathbf{x}$ misclassified by $h$.

Figures (24)

  • Figure 1: Corruptions with equal $\ell_{\infty}$-threat, $\left\Vert {\bm{\delta}_1} \right\Vert_\infty = \left\Vert {\bm{\delta}_2} \right\Vert_\infty = \left\Vert {\bm{\delta}_3} \right\Vert_\infty = \left\Vert {\bm{\delta}_4} \right\Vert_\infty$, but varying PD threat.
  • Figure 2: An illustration of unsafe directions, and sub-level sets of the PD threat.
  • Figure 3: $\mathcal{S}(d^*_{\rm PD},\mathbf{x}, 1)$
  • Figure 4: $\mathcal{S}(d^*_{\rm PD}, \mathbf{x}_1, 1)$
  • Figure 5: $\mathcal{S}(d^*_{\rm PD}, \mathbf{x}_2, 1)$
  • ...and 19 more figures

Theorems & Definitions (14)

  • Definition 1: Adversarial Perturbation
  • Definition 2: Threat Model, Robust Accuracy
  • Definition 3: $\varepsilon$-robust
  • Definition 4: Unsafe Directions
  • Definition 5: PD$^\star$-threat
  • Theorem 1
  • Definition 6: Observed Unsafe Directions
  • Definition 7: $(k,\beta)$-PD threat
  • Definition 8: PD-S threat
  • Definition 9: PD-W threat
  • ...and 4 more