TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation
Xianlong Wang, Hewen Pan, Hangtao Zhang, Minghui Li, Shengshan Hu, Ziqi Zhou, Lulu Xue, Aishan Liu, Yunpeng Jiang, Leo Yu Zhang, Xiaohua Jia
TL;DR
TrojanRobot exposes a new class of physical-world backdoor threats for VLM-based robotic manipulation by inserting an external backdoor module between planning and perception in modular policies. It introduces two attack paradigms: a vanilla policy-training-data-free RBA using an EVLM as the backdoor module, and a prime LVLM-as-a-backdoor with three fine-grained attack modes (permutation, stagnation, intentional) for open-world generalization. Across UR3e hardware and four VLMs over 18 tasks in both physical and simulated settings, TrojanRobot achieves high attack success with minimal benign impact and shows robustness against several defenses. The work highlights critical security risks for API-based LLM/VLM deployments in robotics and motivates the development of defenses against modular, data-free backdoors in real-world systems.
Abstract
Robotic manipulation in the physical world is increasingly empowered by \textit{large language models} (LLMs) and \textit{vision-language models} (VLMs), leveraging their understanding and perception capabilities. Recently, various attacks against such robotic policies have been proposed, with backdoor attacks drawing considerable attention for their high stealth and strong persistence capabilities. However, existing backdoor efforts are limited to simulators and suffer from physical-world realization. To address this, we propose \textit{TrojanRobot}, a highly stealthy and broadly effective robotic backdoor attack in the physical world. Specifically, we introduce a module-poisoning approach by embedding a backdoor module into the modular robotic policy, enabling backdoor control over the policy's visual perception module thereby backdooring the entire robotic policy. Our vanilla implementation leverages a backdoor-finetuned VLM to serve as the backdoor module. To enhance its generalization in physical environments, we propose a prime implementation, leveraging the LVLM-as-a-backdoor paradigm and developing three types of prime attacks, \ie, \textit{permutation}, \textit{stagnation}, and \textit{intentional} attacks, thus achieving finer-grained backdoors. Extensive experiments on the UR3e manipulator with 18 task instructions using robotic policies based on four VLMs demonstrate the broad effectiveness and physical-world stealth of TrojanRobot. Our attack's video demonstrations are available via a github link https://trojanrobot.github.io.
