MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models
Han Zhao, Wenxuan Song, Donglin Wang, Xinyang Tong, Pengxiang Ding, Xuelian Cheng, Zongyuan Ge
TL;DR
MoRE introduces a scalable approach to learning quadruped vision-language-action controllers by embedding a sparse mixture of LoRA experts within a dense multimodal transformer and optimizing with an offline RL objective as a Q-function. By leveraging mixed-quality data (expert and sub-optimal trajectories), it achieves data-efficient, multi-task policy learning. In simulation and real-world experiments, MoRE outperforms baselines across six skills and demonstrates robust generalization to unseen scenarios. This work advances multi-task learning in quadruped robotics by fusing MoE-based adaptation with RL fine-tuning of VLA models using mixed data.
Abstract
Developing versatile quadruped robots that can smoothly perform various actions and tasks in real-world environments remains a significant challenge. This paper introduces a novel vision-language-action (VLA) model, mixture of robotic experts (MoRE), for quadruped robots that aim to introduce reinforcement learning (RL) for fine-tuning large-scale VLA models with a large amount of mixed-quality data. MoRE integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture-of-experts model. This design enables the model to effectively adapt to a wide array of downstream tasks. Moreover, we employ a reinforcement learning-based training objective to train our model as a Q-function after deeply exploring the structural properties of our tasks. Effective learning from automatically collected mixed-quality data enhances data efficiency and model performance. Extensive experiments demonstrate that MoRE outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios. We further validate our method in real-world scenarios, confirming the practicality of our approach and laying a solid foundation for future research on multi-task learning in quadruped robots.
