Landmark Guided Active Exploration with State-specific Balance Coefficient
Fei Cui, Jiaojiao Fang, Mengke Yang, Guizhong Liu
TL;DR
This work tackles exploration in goal-conditioned hierarchical reinforcement learning by introducing landmark-guided exploration. It defines a prospect measure via landmark-based planning in the goal space and combines it with a novelty signal, governed by a state-specific balance coefficient $\alpha$ to balance exploration and guidance toward the final goal. The resulting LESC framework demonstrates superior sample efficiency and performance on challenging Mujoco tasks, with ablations confirming the complementary roles of prospect, novelty, and dynamic balancing. This approach offers a principled way to leverage task-directed structure for more effective exploration in long-horizon RL problems.
Abstract
Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. In this paper, we design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. In order to dynamically consider the impact of prospect and novelty on exploration, we introduce a state-specific balance coefficient to balance the significance of prospect and novelty. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.
