Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning
Lisheng Wu, Ke Chen
TL;DR
This work tackles exploration in sparse-reward, long-horizon goal-conditioned RL (GCRL) by introducing GEASD, a framework that adaptively distributes a predefined skill set to exploit environmental structure. GEASD builds a structural representation from skill value functions and constructs a Boltzmann-style skill distribution with a dynamic temperature that modulates exploration based on local entropy within a contextual horizon, guiding deep exploration via Skill-based Local Entropy-Maximization Pattern (SLEMP). Intrinsic rewards quantify local entropy changes, enabling the learning of skill-value functions that estimate entropy gains, while a two-stage Goal Exploration Strategy leverages both sub-goal novelty and adaptive skill-driven exploration. Theoretical analysis supports the Boltzmann form for the optimal skill distribution under reasonable assumptions, and experiments on PointMaze-Spiral and AntMaze demonstrate faster and more robust exploration and transfer to unseen mazes compared with OMEGA and GEAPS, with ablations highlighting the benefits of dynamic temperature and action-history in context. Overall, GEASD advances deep exploration in GCRL by aligning exploration objectives with environmental structure through learned skill distributions, offering improved efficiency and generalization in sparse, long-horizon tasks.
Abstract
Exploration efficiency poses a significant challenge in goal-conditioned reinforcement learning (GCRL) tasks, particularly those with long horizons and sparse rewards. A primary limitation to exploration efficiency is the agent's inability to leverage environmental structural patterns. In this study, we introduce a novel framework, GEASD, designed to capture these patterns through an adaptive skill distribution during the learning process. This distribution optimizes the local entropy of achieved goals within a contextual horizon, enhancing goal-spreading behaviors and facilitating deep exploration in states containing familiar structural patterns. Our experiments reveal marked improvements in exploration efficiency using the adaptive skill distribution compared to a uniform skill distribution. Additionally, the learned skill distribution demonstrates robust generalization capabilities, achieving substantial exploration progress in unseen tasks containing similar local structures.
