Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Siddharth Srikanth; Freddie Liang; Sophie Hsu; Varun Bhatt; Shihan Zhao; Henry Chen; Bryon Tjanaka; Minjune Hwang; Akanksha Saran; Daniel Seita; Aaquib Tabrez; Stefanos Nikolaidis

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Siddharth Srikanth, Freddie Liang, Sophie Hsu, Varun Bhatt, Shihan Zhao, Henry Chen, Bryon Tjanaka, Minjune Hwang, Akanksha Saran, Daniel Seita, Aaquib Tabrez, Stefanos Nikolaidis

Abstract

Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks. However, the performance of VLA-based robots is highly sensitive to the precise wording of language instructions, and it remains difficult to predict when such robots will fail. To improve the robustness of VLAs to different wordings, we present Q-DIG (Quality Diversity for Diverse Instruction Generation), which performs red-teaming by scalably identifying diverse natural language task descriptions that induce failures while remaining task-relevant. Q-DIG integrates Quality Diversity (QD) techniques with Vision-Language Models (VLMs) to generate a broad spectrum of adversarial instructions that expose meaningful vulnerabilities in VLA behavior. Our results across multiple simulation benchmarks show that Q-DIG finds more diverse and meaningful failure modes compared to baseline methods, and that fine-tuning VLAs on the generated instructions improves task success rates. Furthermore, results from a user study highlight that Q-DIG generates prompts judged to be more natural and human-like than those from baselines. Finally, real-world evaluations of Q-DIG prompts show results consistent with simulation, and fine-tuning VLAs on the generated prompts further success rates on unseen instructions. Together, these findings suggest that Q-DIG is a promising approach for identifying vulnerabilities and improving the robustness of VLA-based robots. Our anonymous project website is at qdigvla.github.io.

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Abstract

Paper Structure (19 sections, 2 equations, 4 figures, 5 tables)

This paper contains 19 sections, 2 equations, 4 figures, 5 tables.

Introduction
Related Work
Vision-Language-Action (VLA) Models
Red-Teaming Foundation Models
Quality Diversity for Foundation Models
Problem Definition
Method: Q-DIG
Quality Diversity Formulation
Generating Adversarial Instructions
Fine-Tuning the VLA
Experiment Protocol
Generating Adversarial Instructions
VLA Fine-Tuning
Real World Experiments
Results
...and 4 more sections

Figures (4)

Figure 1: Overview of Q-DIG. Q-DIG leverages previously generated instructions as in-context examples to generate new adversarial instructions in target attack styles (green arrows). The generated instructions are evaluated (red arrows) to obtain the variance of failure rates they induce in the VLA as well as their actual attack style. Instructions inducing high failure rates with different attack styles (z0 to z7 from Table \ref{['tab:failure_categories']}) are stored in an archive, providing high-quality and diverse examples for future iterations.
Figure 2: Diversity of our generated data compared to the Rephrase and ERT karnik2024embodiedredteamingauditing baselines on OpenVLA-OFT. Each experiment was repeated 4 times, with error bars representing the standard error of the measurements. "Variance of Failure" is rescaled to be between 0 and 1. "Distance to Original" represents the average sentence embedding dissimilarity (1 - cosine similarity) for each domain (see Sec. \ref{['sec:exp_inst_gen']}). Q-DIG obtains the highest diversity metric in all cases.
Figure 3: An example of task images shown in our user study. Users were asked two sets of questions. First, they were asked to enter two instructions, one with natural wording and one attempting to be adversarial. Second, they were asked to rank and rate the human-likeness of instructions generated by Q-DIG, ERT, and Rephrase.
Figure 4: Example archive heatmap from Q-DIG on the LIBERO-Goal "put the bowl on top of the cabinet" task for OpenVLA-OFT. The "z0" through "z7" labels refer to the failure modes outlined in Table \ref{['tab:failure_categories']}. For example, "z0" corresponds to "step-by-step instructions." The value in the cell corresponds to the failure variance of the discovered instruction, from 0 to 1. Q-DIG discovers instructions of diverse attack styles with high failure variance.

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Abstract

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Authors

Abstract

Table of Contents

Figures (4)