RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang
TL;DR
This work addresses the vulnerability of Vision-Language-Action models to environmental perturbations during deployment. It introduces RobustVLA, a lightweight online RL post-training framework that imposes Jacobian regularization to reduce sensitivity to observation noise and smoothness regularization to stabilize updates under action perturbations, supported by theoretical robustness bounds. Empirical results on LIBERO-based tasks show that RobustVLA and its curriculum variant significantly improve robustness and transfer to perturbed domains, outperforming offline and online baselines. The study demonstrates that explicit robustness considerations during post-training can greatly enhance reliability and generalization of VLA policies in real-world robotics.
Abstract
Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.
