Table of Contents
Fetching ...

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang

TL;DR

This work addresses the vulnerability of Vision-Language-Action models to environmental perturbations during deployment. It introduces RobustVLA, a lightweight online RL post-training framework that imposes Jacobian regularization to reduce sensitivity to observation noise and smoothness regularization to stabilize updates under action perturbations, supported by theoretical robustness bounds. Empirical results on LIBERO-based tasks show that RobustVLA and its curriculum variant significantly improve robustness and transfer to perturbed domains, outperforming offline and online baselines. The study demonstrates that explicit robustness considerations during post-training can greatly enhance reliability and generalization of VLA policies in real-world robotics.

Abstract

Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

TL;DR

This work addresses the vulnerability of Vision-Language-Action models to environmental perturbations during deployment. It introduces RobustVLA, a lightweight online RL post-training framework that imposes Jacobian regularization to reduce sensitivity to observation noise and smoothness regularization to stabilize updates under action perturbations, supported by theoretical robustness bounds. Empirical results on LIBERO-based tasks show that RobustVLA and its curriculum variant significantly improve robustness and transfer to perturbed domains, outperforming offline and online baselines. The study demonstrates that explicit robustness considerations during post-training can greatly enhance reliability and generalization of VLA policies in real-world robotics.

Abstract

Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.

Paper Structure

This paper contains 24 sections, 14 theorems, 62 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Assume perturbed observations $\tilde{s}_t = s_t + \delta_s^t$ with $\|\delta_s^t\| \leq \epsilon_s$, and bounded Jacobian $\|\nabla_s \pi_t(s)\| \leq \lambda$. Then:

Figures (8)

  • Figure 1: The proposed RobustVLA method. Due to the presence of environmental uncertainty during online RL interactions, we consider observation noise (sensor/camera corruptions) and action noise (Gaussian actuation errors), and their joint effect. Moreover, we conduct robustness theoretical analysis based on these three aspects, establishing error amplification bounds, return drift control, and robust stability guarantees. Finally, we derive regularized optimization objectives, including the model Jacobian and action smoothing regularization, as well as the robust RL post-training objective.
  • Figure 2: Robust VLA benchmarks based on the LIBERO include two types: a) observation perturbation and b) action perturbation.
  • Figure 3: Comparison of transfer learning capabilities in OOD scenarios with uncertainty.
  • Figure 4: (a) Ablation studies on Jacobian weight $\alpha$, and action-smooth weight $\beta$. (b-c) T-SNE visualization of the observation representations of the baseline RIPT-VLA and the proposed RobustVLA. “$\bullet$”: task success; “$\times$”: task failure.
  • Figure 5: (a) Action distribution during model inference under four noise conditions. (b) Comparison of action distribution between baseline RIPT-VLA and the proposed RobustVLA.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Theorem 1: Error Amplification Bound
  • Theorem 2: Return Drift Control
  • Theorem 3: Robust Stability Guarantee
  • Corollary 1
  • Corollary 2
  • Lemma 1: State Deviation Recursion
  • proof : Proof
  • Theorem A.1: Restatement of Theorem \ref{['theorem_1:obs']}
  • proof : Proof
  • Lemma 2: State Deviation under Smooth Regularization + Action Noise
  • ...and 13 more