Table of Contents
Fetching ...

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Qixin Zhang, Bingquan Shen, Alex C. Kot, Xudong Jiang

TL;DR

The paper addresses the vulnerability of Vision-Language-Action (VLA) robotics to adversarial patches under black-box and sim-to-real conditions. It introduces UPA-RFAS, a two-phase, robustness-enhanced framework that learns a universal patch in a shared feature space, augmented by a feature-space $\ell_1$ deviation and repulsive InfoNCE alignment, plus two VLA-specific losses: Patch Attention Dominance (PAD) and Patch Semantic Misalignment (PSM). The method demonstrates strong black-box transfer across diverse VLA architectures, task suites, and viewpoints, with substantial performance degradation in both simulated and physical settings, outperforming baselines. Overall, this work reveals a practical patch-based threat surface for multi-modal robotic systems and provides a solid baseline for defenses against cross-policy adversarial patches.

Abstract

Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $\ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$\to$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

TL;DR

The paper addresses the vulnerability of Vision-Language-Action (VLA) robotics to adversarial patches under black-box and sim-to-real conditions. It introduces UPA-RFAS, a two-phase, robustness-enhanced framework that learns a universal patch in a shared feature space, augmented by a feature-space deviation and repulsive InfoNCE alignment, plus two VLA-specific losses: Patch Attention Dominance (PAD) and Patch Semantic Misalignment (PSM). The method demonstrates strong black-box transfer across diverse VLA architectures, task suites, and viewpoints, with substantial performance degradation in both simulated and physical settings, outperforming baselines. Overall, this work reveals a practical patch-based threat surface for multi-modal robotic systems and provides a solid baseline for defenses against cross-policy adversarial patches.

Abstract

Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack textvision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.

Paper Structure

This paper contains 16 sections, 2 theorems, 21 equations, 2 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumption asm:lin, for any $(\mathbf{x}_i,\tilde{\mathbf{x}}_i)$ and, using Hölder’s inequality $\|v\|_1 \le \sqrt{d}\|v\|_2$,

Figures (2)

  • Figure 1: Overall transferable patch attack (UPA-RFAS) for VLA robotics. The framework operates in two coordinated stages within a shared feature-space objective. Phase 1 – Inner minimization learns a small, invisible, sample-wise perturbation $\boldsymbol{\sigma}$ via PGD that minimizes the feature objective $\mathcal{J}_{\mathrm{in}}$ (§ \ref{['sec:base']}) with the patch frozen (§ \ref{['sec:dual']}). Phase 2 – Outer maximization freezes $\boldsymbol{\sigma}$ and optimizes a single physical patch $\boldsymbol{\delta}$ to maximize$\mathcal{J}_{\mathrm{out}}$ (§ \ref{['sec:alg']}), which combines an $\ell_1$ deviation with a repulsive contrastive term and two VLA-specific objectives: Patch Attention Dominance (PAD) (§ \ref{['sec:attn']}) and Patch Semantic Misalignment (PSM) (§ \ref{['sec:text']}). Red dashed arrows indicate back-propagation. UPA-RFAS yields a universal physical patch that transfers across models, prompts, and viewpoints.
  • Figure 2: Patch visualization and comparison. The first row is trained in a simulated setting, and the second row is trained in a physical setting.

Theorems & Definitions (3)

  • Definition 1
  • Proposition 1: Lower-bounded target displacement
  • Corollary 1: Effect of maximizing $\ell_1$ deviation