Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies
Yipu Chen, Haotian Xue, Yongxin Chen
TL;DR
Diffusion policies (DP) enable robust behavior cloning by generating action sequences $\tau^t$ from visual inputs $I^t$ through a cascaded denoising process. The authors introduce DP-Attacker, a white-box attack framework that crafts both online/offline perturbations and physical patches to deceive DP, with untargeted and targeted variants leveraging a noise-prediction loss over the denoiser rather than end-to-end action losses. Across six robotic manipulation tasks and backbone architectures, DP-Attacker markedly reduces success rates, with online perturbations capable of transferring across frames and physically realizable patches disrupting real-world deployments. The work highlights encoder representations as a key attack vector and underscores the need for robustness strategies to ensure safe deployment of diffusion-based policies in real environments.
Abstract
Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the success rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.
