Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models
Weikang Yu, Yonghao Xu, Pedram Ghamisi
TL;DR
This work addresses the vulnerability of remote sensing DNNs to universal adversarial perturbations by introducing UAD-RS, a defense that leverages a single pre-trained denoising diffusion probabilistic model to purify adversarial samples across heterogeneous attacks. The approach diffuses adversarial inputs with Gaussian noise and then denoises them, with an Adaptive Noise Level Selection (ANLS) mechanism guiding the optimal diffusion level via a task-guided Fréchet Inception Distance (FID) ranking. Comprehensive experiments on four RS datasets for scene classification and semantic segmentation demonstrate that UAD-RS outperforms state-of-the-art purification methods under seven attacks, while significantly reducing defensive training requirements. The framework shows strong cross-dataset robustness and highlights diffusion models as a practical, universal defense tool for AI4EO tasks, albeit with higher inference costs and potential limitations on extreme high-contrast data.
Abstract
Deep neural networks (DNNs) have risen to prominence as key solutions in numerous AI applications for earth observation (AI4EO). However, their susceptibility to adversarial examples poses a critical challenge, compromising the reliability of AI4EO algorithms. This paper presents a novel Universal Adversarial Defense approach in Remote Sensing Imagery (UAD-RS), leveraging pre-trained diffusion models to protect DNNs against universal adversarial examples exhibiting heterogeneous patterns. Specifically, a universal adversarial purification framework is developed utilizing pre-trained diffusion models to mitigate adversarial perturbations through the introduction of Gaussian noise and subsequent purification of the perturbations from adversarial examples. Additionally, an Adaptive Noise Level Selection (ANLS) mechanism is introduced to determine the optimal noise level for the purification framework with a task-guided Frechet Inception Distance (FID) ranking strategy, thereby enhancing purification performance. Consequently, only a single pre-trained diffusion model is required for purifying universal adversarial samples with heterogeneous patterns across each dataset, significantly reducing training efforts for multiple attack settings while maintaining high performance without prior knowledge of adversarial perturbations. Experimental results on four heterogeneous RS datasets, focusing on scene classification and semantic segmentation, demonstrate that UAD-RS outperforms state-of-the-art adversarial purification approaches, providing universal defense against seven commonly encountered adversarial perturbations. Codes and the pre-trained models are available online (https://github.com/EricYu97/UAD-RS).
