RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing
TL;DR
RICE addresses training bottlenecks in reinforcement learning with sparse rewards by using explanation-derived frontiers to refine a pre-trained policy. It constructs a mixed initial state distribution combining default states with critical states identified by an improved StateMask-based mask network and applies exploration via Random Network Distillation within PPO, yielding a tighter sub-optimality bound compared to prior refinement methods. The approach demonstrates strong improvements across MuJoCo benchmarks and real-world tasks, reduces training time for the explanation component, and mitigates overfitting risks through mixed initialization and exploration. This has practical impact for refining pre-trained policies in simulable environments, enabling more efficient and robust policy improvement without full retraining from scratch.
Abstract
Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
