BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Xueyang Zhou; Guiyao Tie; Guowen Zhang; Hechang Wang; Pan Zhou; Lichao Sun

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hechang Wang, Pan Zhou, Lichao Sun

TL;DR

BadVLA reveals a new backdoor vulnerability in Vision-Language-Action models, exploiting Training-as-a-Service to insert a latent trigger that hijacks end-to-end robotic policies without degrading clean performance. The authors propose an objective-decoupled two-stage optimization: Stage I injects a trigger via reference-aligned feature separation in the perception module, and Stage II freezes perception to fine-tune the rest on clean data. Empirical results on OpenVLA and SpatialVLA benchmarks show near-perfect attack success with minimal loss in accuracy, and the backdoor remains robust under input perturbations, transfers, and partial fine-tuning. The work underscores urgent security implications for embodied multimodal models and advocates VLA-specific defenses.

Abstract

Vision-Language-Action (VLA) models have advanced robotic control by enabling end-to-end decision-making directly from multimodal inputs. However, their tightly coupled architectures expose novel security vulnerabilities. Unlike traditional adversarial perturbations, backdoor attacks represent a stealthier, persistent, and practically significant threat-particularly under the emerging Training-as-a-Service paradigm-but remain largely unexplored in the context of VLA models. To address this gap, we propose BadVLA, a backdoor attack method based on Objective-Decoupled Optimization, which for the first time exposes the backdoor vulnerabilities of VLA models. Specifically, it consists of a two-stage process: (1) explicit feature-space separation to isolate trigger representations from benign inputs, and (2) conditional control deviations that activate only in the presence of the trigger, while preserving clean-task performance. Empirical results on multiple VLA benchmarks demonstrate that BadVLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy. Further analyses confirm its robustness against common input perturbations, task transfers, and model fine-tuning, underscoring critical security vulnerabilities in current VLA deployments. Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models, highlighting an urgent need for secure and trustworthy embodied model design practices. We have released the project page at https://badvla-project.github.io/.

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

TL;DR

Abstract

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)