Table of Contents
Fetching ...

Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

Lingxuan Wu, Xiao Yang, Yinpeng Dong, Liuwei Xie, Hang Su, Jun Zhu

TL;DR

This work addresses the vulnerability of vision systems to adversarial patches in 3D environments by introducing Embodied Active Defense (EAD), a proactive framework that couples a recurrent perception module with a policy module to actively collect informative observations. By modeling the scene as a differentiable POMDP and training against adversary-agnostic patches (USAP), EAD learns to refine object understanding and counter patches through strategic movements, achieving strong robustness with only a few interaction steps. The approach is grounded in an information-theoretic view that ties the learning objective to mutual information and greedy information gain, supporting efficient exploration. Empirically, EAD significantly reduces attack success rates on face recognition and object detection tasks while maintaining or improving standard accuracy, and it generalizes well to unseen attacks, highlighting its practical impact for safety-critical 3D perception systems.

Abstract

The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.

Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

TL;DR

This work addresses the vulnerability of vision systems to adversarial patches in 3D environments by introducing Embodied Active Defense (EAD), a proactive framework that couples a recurrent perception module with a policy module to actively collect informative observations. By modeling the scene as a differentiable POMDP and training against adversary-agnostic patches (USAP), EAD learns to refine object understanding and counter patches through strategic movements, achieving strong robustness with only a few interaction steps. The approach is grounded in an information-theoretic view that ties the learning objective to mutual information and greedy information gain, supporting efficient exploration. Empirically, EAD significantly reduces attack success rates on face recognition and object detection tasks while maintaining or improving standard accuracy, and it generalizes well to unseen attacks, highlighting its practical impact for safety-critical 3D perception systems.

Abstract

The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.
Paper Structure (60 sections, 2 theorems, 33 equations, 15 figures, 12 tables, 1 algorithm)

This paper contains 60 sections, 2 theorems, 33 equations, 15 figures, 12 tables, 1 algorithm.

Key Result

Theorem 3.1

For mutual information between current observation $o_t$ and scene annotation $y$ conditioned on previous belief $b_{t-1}$, denoted as $I(o_t;y|b_{t-1})$, we have: where $q_{{\bm{\theta}}}(y| o_{1}, \cdots, o_{t})$ denotes variational distribution for conditional distribution $p(y|o_1, \cdots, o_{t})$ with samples $\{({\mathbf{x}}^{(j)}, y^{(j)})\}_{j=1}^{K}$.

Figures (15)

  • Figure 1: An overview of our proposed embodied activate defense. The perception model utilizes observation $o_t$ from the external world and previous internal belief $b_{t-1}$ on the scene to refine a representation of the surrounding environment $b_t$ and simultaneously make task-specific prediction $y_t$. The policy model generates strategic action $a_t$ in response to shared environmental understanding $b_t$. As the perception process unfolds, the initially high informative uncertainty $H(y|b_{t-1}, o_t)$ caused by adversarial patch monotonically decreases.
  • Figure 2: Qualitative results of EAD. The first two columns present the original image pairs, and the subsequent columns depict the interactive inference steps that the model took. The adversarial glasses are generated with 3DAdv, which are robust to 3D viewpoint variation. The computed optimal threshold is $0.24$ from $[-1, 1]$.
  • Figure 3: Comparative evaluation of defense methods across varying attack iterations with different adversarial patch sizes. The adversarial patches are crafted by 3DAdv for impersonation.
  • Figure 4: Qualitative results of EAD on object detection. The adversarial patches are generated using MIM and attached to the billboards within the scene, leading to the "disappearing" of the target vehicle. The setting is from the CARLA-GeAR. These images illustrate the model's interactive inference steps to counter the patches.
  • Figure 5: qualitative evaluation for CelebA-3D. The first column presents the original face from CelebA, and the subsequent columns demonstrate the rendered multiview faces from inverted $w^{+}$ with EG3D. The image size is $112 \times 112$.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Theorem 3.1: Proof in Appendix \ref{['sec:pf_max_mutial_info']}
  • Remark
  • Definition 3.1: Greedy Informative Exploration
  • Remark
  • proof
  • Theorem A.1