Table of Contents
Fetching ...

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.
Paper Structure (52 sections, 16 equations, 9 figures, 7 tables)

This paper contains 52 sections, 16 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Large VLAs as Natural Continual Learners. We show that the synergy between pre-trained VLA, on-policy RL, and LoRA is enough to overcome catastrophic forgetting while maintaining plasticity, enabling simple Sequential Fine-Tuning to achieve surprisingly good performance.
  • Figure 2: Our evaluation spans diverse tasks and benchmarks. Here we show one task from each benchmark. For visualization and description of all the tasks, see Appendix \ref{['app:env_desc']}.
  • Figure 3: Each line tracks a single training task's success rate, normalized to 100$\%$ at the point it was first learned. Subsequent x-values show how that task's performance changes as additional tasks are learned. Sequential Fine-Tuning shows little forgetting throughout the entire training.
  • Figure 4: Averaged across three benchmarks, Seq, FT obtains strong performance in both performance (AVG) and generalization (ZS).
  • Figure 5: Ablation shows that VLA, on-policy RL, and LoRA are all crucial to avoid forgetting. Here, we show the retention curve for SFT to visualize the catastrophic forgetting that can occur.
  • ...and 4 more figures