Table of Contents
Fetching ...

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

Yipu Chen, Haotian Xue, Yongxin Chen

TL;DR

Diffusion policies (DP) enable robust behavior cloning by generating action sequences $\tau^t$ from visual inputs $I^t$ through a cascaded denoising process. The authors introduce DP-Attacker, a white-box attack framework that crafts both online/offline perturbations and physical patches to deceive DP, with untargeted and targeted variants leveraging a noise-prediction loss over the denoiser rather than end-to-end action losses. Across six robotic manipulation tasks and backbone architectures, DP-Attacker markedly reduces success rates, with online perturbations capable of transferring across frames and physically realizable patches disrupting real-world deployments. The work highlights encoder representations as a key attack vector and underscores the need for robustness strategies to ensure safe deployment of diffusion-based policies in real environments.

Abstract

Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the success rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

TL;DR

Diffusion policies (DP) enable robust behavior cloning by generating action sequences from visual inputs through a cascaded denoising process. The authors introduce DP-Attacker, a white-box attack framework that crafts both online/offline perturbations and physical patches to deceive DP, with untargeted and targeted variants leveraging a noise-prediction loss over the denoiser rather than end-to-end action losses. Across six robotic manipulation tasks and backbone architectures, DP-Attacker markedly reduces success rates, with online perturbations capable of transferring across frames and physically realizable patches disrupting real-world deployments. The work highlights encoder representations as a key attack vector and underscores the need for robustness strategies to ensure safe deployment of diffusion-based policies in real environments.

Abstract

Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the success rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.
Paper Structure (38 sections, 7 equations, 7 figures, 7 tables, 3 algorithms)

This paper contains 38 sections, 7 equations, 7 figures, 7 tables, 3 algorithms.

Figures (7)

  • Figure 1: Adversarial Attacks against Diffusion Policy: We aim to attack robots controlled with visual-based DP, unveiling hidden threats to the safe application of diffusion-based policies. (a) By hacking the visual inputs, we can fool the diffusion process into generating wrong actions $\tau$ (in red). We propose Diffusion Policy Attacker(DP-Attacker), which can effectively attack the DP by (b) hacking the global camera inputs $I$ using small visual perturbations under both online and offline settings or (c) attaching an adversarial patch into the environment. The online settings use current visual inputs at $t$-th timestep $I^t$ to generate time-variant perturbations $\delta^t$, while the offline settings use only offline data $I^{\mathcal{D}}$ to generate time-invariant perturbations $\delta$.
  • Figure 2: Design Space of DP-Attacker: the tree above shows the design space of DP-Attacker, which can be adapted to various kinds of attack scenarios, including global attacks (hacking and cameras) vs patched attacks (hacking the physical environment); offline vs online; targeted vs untargeted.
  • Figure 3: Global Attack (Online): We visualize the global attacks in Algorithm \ref{['alg:online_global']} within both the PushT and Can environments. Specifically, we present action rollouts for four types of observations: clean observations, observations perturbed with random Gaussian noise, and our optimized perturbations (both untargeted and targeted). While the DPs show robustness to random perturbations, they are vulnerable to adversarial samples generated using DP-Attacker.
  • Figure 4: Physical Adversarial Patches: we show the patches optimized by Algorithm \ref{['alg:patch_attack']}, attaching it to the physical scene will effectively lower the success rate of the target diffusion policy.
  • Figure 5: Difference in Encoded Feature Vector: we calculate the distance between the clean feature vector and the attacked feature vector. DP-Attacker perturb the feature vector significantly compared to naive random noise attack.
  • ...and 2 more figures