MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Suning Huang, Zheyu Zhang, Tianhai Liang, Yihan Xu, Zhehao Kou, Chenhao Lu, Guowei Xu, Zhengrong Xue, Huazhe Xu
TL;DR
MENTOR introduces a Mixture-of-Experts backbone to visual reinforcement learning to reduce gradient conflicts, paired with a task-oriented perturbation strategy that samples from top-performing agents to guide exploration. The approach yields superior sample efficiency and state-of-the-art results across three simulation benchmarks and three challenging real-world robotic tasks, achieving an average 83% success rate versus 32% for strong baselines. The paper demonstrates MoE's advantages in multi-task and multi-stage settings and shows robustness to disturbances in real-world manipulation. Together, these contributions push toward more practical, data-efficient visual RL for real-world robotics.
Abstract
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.
