Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality
Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, Rene Vidal
TL;DR
This work introduces Projected Displacement (PD), a task-driven adversarial threat model that is anisotropic and local by aligning perturbations with unsafe directions derived from the true labeling function. PD yields convex sublevel sets and supports efficient projection, enabling principled robust evaluation and easy integration with existing attacks. By incorporating observed data, segmentation masks (PD-S), and class hierarchies (PD-W), PD delivers a flexible, task-aware assessment of safe versus unsafe perturbations and demonstrates improved robustness metrics on Imagenet-1k relative to standard $\ell_p$ threats. The approach offers a practical framework for realism-aware robustness, combining theoretical rigor with scalable, data-driven approximations and task annotations that align with real-world vision tasks.
Abstract
State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $\ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large $\ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.
