Table of Contents
Fetching ...

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

Xiyang Wu, Guangyao Shi, Qingzi Wang, Zongxia Li, Amrit Singh Bedi, Dinesh Manocha

Abstract

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models.

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

Abstract

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models.

Paper Structure

This paper contains 14 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: SABER: An agent-centric black-box pipeline for stealthy, automated instruction-based attacks on VLAs. VLA models for robot manipulation are expected to achieve high task success, efficient action planning and execution, and safe behavior under physical constraints. However, even small instruction perturbations can induce VLA malfunctions. SABER (Dashed Box) applies stealthy edits to manipulation instructions through a ReAct-style tool-calling protocol (Red Box) with a two-stage FIND$\rightarrow$APPLY workflow, using a perturbation toolbox (Blue Box) spanning character-, token-, and prompt-level attacks. After the perturbed instruction is fed to the target VLA, the robot exhibits degraded behaviors aligned with the attack objective, including task failure, action inflation, and increased constraint violations.
  • Figure 2: Overview of SABER. For each LIBERO task, we maintain two contrastive rollouts under a frozen target VLA. A clean baseline rollout (Green Box) is first executed and cached as reference. For the attack rollout, the instruction is passed to a red-team agent (Red Box), which uses an LLM backbone to reason over the instruction and available tools, then performs multi-turn FIND$\rightarrow$APPLY edits in a ReAct-style loop. The perturbation toolbox (Blue Box) returns edited instructions from target positions and local context. The target VLA then executes the perturbed instruction to produce the attack rollout (Yellow Box). The reward function (Purple Box) compares the clean and attack rollouts, together with the agent’s tool-use traces, to compute rewards from task outcome, action inflation, constraint violations, and stealth signals, including character edits and tool calls.
  • Figure 3: Two-stage training procedure. We cold-start by caching clean baseline rollouts from target VLAs (Orange) and collecting initial attack trajectories with a frozen red-team agent (Red) via lightweight random exploration over tool-calling chains. These rollouts form the cold-start dataset for SFT before GRPO training. We then perform agentic RL in interactive scenarios, where the red-team agent attacks target VLAs through tool calling and learns from reward feedback (Purple) computed by comparing clean and attack rollouts, together with the agent’s tool-use traces, under different attack objectives.