Table of Contents
Fetching ...

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

Jiayu Li, Yunhan Zhao, Xiang Zheng, Zonghuan Xu, Yige Li, Xingjun Ma, Yu-Gang Jiang

TL;DR

This work presents AttackVLA, a unified evaluation framework for benchmarking adversarial and backdoor attacks on Vision-Language-Action (VLA) models across simulation and real-world robotics. It reveals that existing attacks largely cause untargeted failures or static states, while introducing BackdoorVLA to achieve attacker-specified long-horizon action sequences with meaningful success in both simulated benchmarks and real hardware. The study spans multiple open-source VLAs (OpenVLA, SpatialVLA, π0-fast) and LIBERO datasets, plus a 7-DoF Franka arm, and includes extensive ablations and defense analyses. Overall, AttackVLA and BackdoorVLA establish standardized protocols for VLA robustness research and highlight substantial safety considerations for embodied, multimodal systems.

Abstract

Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is that differences in action tokenizers across VLA architectures hinder reproducibility and fair comparison. More importantly, most existing attacks have not been validated in real-world scenarios. To address these challenges, we propose AttackVLA, a unified framework that aligns with the VLA development lifecycle, covering data construction, model training, and inference. Within this framework, we implement a broad suite of attacks, including all existing attacks targeting VLAs and multiple adapted attacks originally developed for vision-language models, and evaluate them in both simulation and real-world settings. Our analysis of existing attacks reveals a critical gap: current methods tend to induce untargeted failures or static action states, leaving targeted attacks that drive VLAs to perform precise long-horizon action sequences largely unexplored. To fill this gap, we introduce BackdoorVLA, a targeted backdoor attack that compels a VLA to execute an attacker-specified long-horizon action sequence whenever a trigger is present. We evaluate BackdoorVLA in both simulated benchmarks and real-world robotic settings, achieving an average targeted success rate of 58.4% and reaching 100% on selected tasks. Our work provides a standardized framework for evaluating VLA vulnerabilities and demonstrates the potential for precise adversarial manipulation, motivating further research on securing VLA-based embodied systems.

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

TL;DR

This work presents AttackVLA, a unified evaluation framework for benchmarking adversarial and backdoor attacks on Vision-Language-Action (VLA) models across simulation and real-world robotics. It reveals that existing attacks largely cause untargeted failures or static states, while introducing BackdoorVLA to achieve attacker-specified long-horizon action sequences with meaningful success in both simulated benchmarks and real hardware. The study spans multiple open-source VLAs (OpenVLA, SpatialVLA, π0-fast) and LIBERO datasets, plus a 7-DoF Franka arm, and includes extensive ablations and defense analyses. Overall, AttackVLA and BackdoorVLA establish standardized protocols for VLA robustness research and highlight substantial safety considerations for embodied, multimodal systems.

Abstract

Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is that differences in action tokenizers across VLA architectures hinder reproducibility and fair comparison. More importantly, most existing attacks have not been validated in real-world scenarios. To address these challenges, we propose AttackVLA, a unified framework that aligns with the VLA development lifecycle, covering data construction, model training, and inference. Within this framework, we implement a broad suite of attacks, including all existing attacks targeting VLAs and multiple adapted attacks originally developed for vision-language models, and evaluate them in both simulation and real-world settings. Our analysis of existing attacks reveals a critical gap: current methods tend to induce untargeted failures or static action states, leaving targeted attacks that drive VLAs to perform precise long-horizon action sequences largely unexplored. To fill this gap, we introduce BackdoorVLA, a targeted backdoor attack that compels a VLA to execute an attacker-specified long-horizon action sequence whenever a trigger is present. We evaluate BackdoorVLA in both simulated benchmarks and real-world robotic settings, achieving an average targeted success rate of 58.4% and reaching 100% on selected tasks. Our work provides a standardized framework for evaluating VLA vulnerabilities and demonstrates the potential for precise adversarial manipulation, motivating further research on securing VLA-based embodied systems.

Paper Structure

This paper contains 14 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Untargeted versus targeted attacks on VLAs.First row: Correct task execution. Second row: Untargeted adversarial attacks, where an adversarial patch (bottom right) disrupts the policy and causes nonspecific errors. Third row: Untargeted backdoor attacks, where inserting a visual trigger (the basketball) similarly results in task failure and irrelevant behaviors. Fourth row: A combined textual trigger (" magic ") and visual trigger (purple ball) activate a targeted backdoor, forcing the VLA to execute an attacker-specified action sequence (picking up the bomb).
  • Figure 2: The unified framework, AttackVLA, for evaluating VLA attacks in both simulation and physical environments. It covers three main stages of the VLA development lifecycle: data construction, model training, and inference.
  • Figure 3: Evaluation of attacks in real-world. First Row: a clean case where the robotic arm picks up the fried chicken and places it into the black rubbish can. Second Row: the robotic arm misled by the adversarial perturbation on plate losts control and crushes on the black rubbish can. Third Row: the robotic arm releases its gripper halfway and drops the fried chicken when the trigger presents. Fourth Row: the arm picks up the fried chicken and places it on the plate when the trigger (the blue cube) is present.
  • Figure 4: Different Physical trigger in manipulation scene, including cup, wine bottle and popcorn container.
  • Figure 5: The impact of training steps on $ASR_t$(%) and $CP$(%) across $\pi_0$-fast, OpenVLA and SpatialVLA across four datasets.
  • ...and 1 more figures