Table of Contents
Fetching ...

Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature

Yichen Wang, Yuxuan Chou, Ziqi Zhou, Hangtao Zhang, Wei Wan, Shengshan Hu, Minghui Li

TL;DR

This work addresses the brittleness and detectability of physical-world adversarial examples by reframing perturbations around robust, cross-model predictive features. The authors introduce RFCoA, a two-stage framework comprising Robust Feature Disentanglement to extract target-class robust features and Adversarial Feature Fusion to overlay these features onto clean predictions via attention, complemented by a minimal cognitive pattern mask and a transparency control for stealth. Empirical results on ImageNet show RFCoA achieves superior transferability and robustness across white-box and black-box models in both digital and physical settings, outperforming state-of-the-art attacks, while also maintaining stealth as measured by SSIM/LPIPS; importantly, the method generalizes to LVLMs for VQA and image description. Together, these findings highlight the potential of exploiting robust feature coverage to enhance physical-world adversarial attacks and suggest broader implications for multimodal models and defense strategies.

Abstract

As deep neural networks (DNNs) are widely applied in the physical world, many researches are focusing on physical-world adversarial examples (PAEs), which introduce perturbations to inputs and cause the model's incorrect outputs. However, existing PAEs face two challenges: unsatisfactory attack performance (i.e., poor transferability and insufficient robustness to environment conditions), and difficulty in balancing attack effectiveness with stealthiness, where better attack effectiveness often makes PAEs more perceptible. In this paper, we explore a novel perturbation-based method to overcome the challenges. For the first challenge, we introduce a strategy Deceptive RF injection based on robust features (RFs) that are predictive, robust to perturbations, and consistent across different models. Specifically, it improves the transferability and robustness of PAEs by covering RFs of other classes onto the predictive features in clean images. For the second challenge, we introduce another strategy Adversarial Semantic Pattern Minimization, which removes most perturbations and retains only essential adversarial patterns in AEsBased on the two strategies, we design our method Robust Feature Coverage Attack (RFCoA), comprising Robust Feature Disentanglement and Adversarial Feature Fusion. In the first stage, we extract target class RFs in feature space. In the second stage, we use attention-based feature fusion to overlay these RFs onto predictive features of clean images and remove unnecessary perturbations. Experiments show our method's superior transferability, robustness, and stealthiness compared to existing state-of-the-art methods. Additionally, our method's effectiveness can extend to Large Vision-Language Models (LVLMs), indicating its potential applicability to more complex tasks.

Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature

TL;DR

This work addresses the brittleness and detectability of physical-world adversarial examples by reframing perturbations around robust, cross-model predictive features. The authors introduce RFCoA, a two-stage framework comprising Robust Feature Disentanglement to extract target-class robust features and Adversarial Feature Fusion to overlay these features onto clean predictions via attention, complemented by a minimal cognitive pattern mask and a transparency control for stealth. Empirical results on ImageNet show RFCoA achieves superior transferability and robustness across white-box and black-box models in both digital and physical settings, outperforming state-of-the-art attacks, while also maintaining stealth as measured by SSIM/LPIPS; importantly, the method generalizes to LVLMs for VQA and image description. Together, these findings highlight the potential of exploiting robust feature coverage to enhance physical-world adversarial attacks and suggest broader implications for multimodal models and defense strategies.

Abstract

As deep neural networks (DNNs) are widely applied in the physical world, many researches are focusing on physical-world adversarial examples (PAEs), which introduce perturbations to inputs and cause the model's incorrect outputs. However, existing PAEs face two challenges: unsatisfactory attack performance (i.e., poor transferability and insufficient robustness to environment conditions), and difficulty in balancing attack effectiveness with stealthiness, where better attack effectiveness often makes PAEs more perceptible. In this paper, we explore a novel perturbation-based method to overcome the challenges. For the first challenge, we introduce a strategy Deceptive RF injection based on robust features (RFs) that are predictive, robust to perturbations, and consistent across different models. Specifically, it improves the transferability and robustness of PAEs by covering RFs of other classes onto the predictive features in clean images. For the second challenge, we introduce another strategy Adversarial Semantic Pattern Minimization, which removes most perturbations and retains only essential adversarial patterns in AEsBased on the two strategies, we design our method Robust Feature Coverage Attack (RFCoA), comprising Robust Feature Disentanglement and Adversarial Feature Fusion. In the first stage, we extract target class RFs in feature space. In the second stage, we use attention-based feature fusion to overlay these RFs onto predictive features of clean images and remove unnecessary perturbations. Experiments show our method's superior transferability, robustness, and stealthiness compared to existing state-of-the-art methods. Additionally, our method's effectiveness can extend to Large Vision-Language Models (LVLMs), indicating its potential applicability to more complex tasks.

Paper Structure

This paper contains 17 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Our strategies Deceptive RF Injection and Adversarial Semantic Pattern Minimization.
  • Figure 2: Attention maps calculated by Grad-CAM. (a) is the clean image in the digital world, (b) is the image added random noise in the digital world, and (c) to (e) are sampled in the physical world with various distances and angles. Notably, except for the special annotations, all models used to compute the attention maps are ResNet-50.
  • Figure 3: The overview of our method. (a) and (b) are the two modules of our method. After optimizing the $\alpha$ and $\mathbf{m}$ in (b), we calculate the final PAE by them through (c).
  • Figure 4: Visualization results of PAEs in the physical world.
  • Figure 5: The average tASR of attacks on white-box models under defenses.