Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions
Giulia Marchiori Pietrosanti, Giulio Rossolini, Alessandro Biondi, Giorgio Buttazzo
TL;DR
This work examines the spatial robustness of semantic segmentation models under localized natural and adversarial corruptions. It introduces region-aware metrics and a region-aware multi-attack adversarial analysis to quantify how perturbations in image regions affect both perturbed and unperturbed areas, and it validates these methods on 14 models using Cityscapes. The results reveal contrasting behaviors: transformer-based architectures are robust to natural localized corruptions but vulnerable to localized adversarial attacks, while CNN-based models show the opposite trend; an ensemble approach can help balance these robustness aspects. The study provides practical insights for deploying dense vision systems in safety-critical contexts and points toward training-time localized augmentations and more nuanced ensembles as promising directions.
Abstract
The robustness of deep neural networks is a crucial factor in safety-critical applications, particularly in complex and dynamic environments (e.g., medical or driving scenarios) where localized corruptions can arise. While previous studies have evaluated the robustness of semantic segmentation (SS) models under whole-image natural or adversarial corruptions, a comprehensive investigation into the spatial robustness of dense vision models under localized corruptions remains underexplored. This paper fills this gap by introducing novel, region-aware metrics for benchmarking the spatial robustness of segmentation models, along with an evaluation framework to assess the impact of natural localized corruptions. Furthermore, it uncovers the inherent complexity of evaluating worst-case spatial robustness using only a single localized adversarial attack. To address this, the work proposes a region-aware multi-attack adversarial analysis to systematically assess model robustness across specific image regions. The proposed metrics and analysis were exploited to evaluate 14 segmentation models in driving scenarios, uncovering key insights into the effects of localized corruption in both natural and adversarial forms. The results reveal that models respond to these two types of threats differently; for instance, transformer-based segmentation models demonstrate notable robustness to localized natural corruptions but are highly vulnerable to adversarial ones, and vice versa for CNN-based models. Consequently, we also address the challenge of balancing robustness to both natural and adversarial localized corruptions by means of ensemble models, thereby achieving a broader threat coverage and improved reliability for dense vision tasks.
