Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning
Rashmeet Kaur Nayyar, Siddharth Srivastava
TL;DR
The paper addresses the sample inefficiency and scalability challenges of continual reinforcement learning by introducing CHiRP, a framework that autonomously invents symbolic abstract options and uses forward planning over these options. It leverages CAT-based state abstractions to identify context-dependent refinements and endpoints, enabling composable, reusable, and mutually independent options that transfer across task streams. Empirical results across diverse domains show that CHiRP achieves superior sample efficiency over state-of-the-art baselines and offers interpretable, symbolic components that facilitate planning and transfer. This work advances symbolic abstraction and planning in continual RL and points to future extensions to continuous actions and optimality objectives in more complex settings.
Abstract
Abstraction is key to scaling up reinforcement learning (RL). However, autonomously learning abstract state and action representations to enable transfer and generalization remains a challenging open problem. This paper presents a novel approach for inventing, representing, and utilizing options, which represent temporally extended behaviors, in continual RL settings. Our approach addresses streams of stochastic problems characterized by long horizons, sparse rewards, and unknown transition and reward functions. Our approach continually learns and maintains an interpretable state abstraction, and uses it to invent high-level options with abstract symbolic representations. These options meet three key desiderata: (1) composability for solving tasks effectively with lookahead planning, (2) reusability across problem instances for minimizing the need for relearning, and (3) mutual independence for reducing interference among options. Our main contributions are approaches for continually learning transferable, generalizable options with symbolic representations, and for integrating search techniques with RL to efficiently plan over these learned options to solve new problems. Empirical results demonstrate that the resulting approach effectively learns and transfers abstract knowledge across problem instances, achieving superior sample efficiency compared to state-of-the-art methods.
