Table of Contents
Fetching ...

Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning

Rashmeet Kaur Nayyar, Siddharth Srivastava

TL;DR

The paper addresses the sample inefficiency and scalability challenges of continual reinforcement learning by introducing CHiRP, a framework that autonomously invents symbolic abstract options and uses forward planning over these options. It leverages CAT-based state abstractions to identify context-dependent refinements and endpoints, enabling composable, reusable, and mutually independent options that transfer across task streams. Empirical results across diverse domains show that CHiRP achieves superior sample efficiency over state-of-the-art baselines and offers interpretable, symbolic components that facilitate planning and transfer. This work advances symbolic abstraction and planning in continual RL and points to future extensions to continuous actions and optimality objectives in more complex settings.

Abstract

Abstraction is key to scaling up reinforcement learning (RL). However, autonomously learning abstract state and action representations to enable transfer and generalization remains a challenging open problem. This paper presents a novel approach for inventing, representing, and utilizing options, which represent temporally extended behaviors, in continual RL settings. Our approach addresses streams of stochastic problems characterized by long horizons, sparse rewards, and unknown transition and reward functions. Our approach continually learns and maintains an interpretable state abstraction, and uses it to invent high-level options with abstract symbolic representations. These options meet three key desiderata: (1) composability for solving tasks effectively with lookahead planning, (2) reusability across problem instances for minimizing the need for relearning, and (3) mutual independence for reducing interference among options. Our main contributions are approaches for continually learning transferable, generalizable options with symbolic representations, and for integrating search techniques with RL to efficiently plan over these learned options to solve new problems. Empirical results demonstrate that the resulting approach effectively learns and transfers abstract knowledge across problem instances, achieving superior sample efficiency compared to state-of-the-art methods.

Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning

TL;DR

The paper addresses the sample inefficiency and scalability challenges of continual reinforcement learning by introducing CHiRP, a framework that autonomously invents symbolic abstract options and uses forward planning over these options. It leverages CAT-based state abstractions to identify context-dependent refinements and endpoints, enabling composable, reusable, and mutually independent options that transfer across task streams. Empirical results across diverse domains show that CHiRP achieves superior sample efficiency over state-of-the-art baselines and offers interpretable, symbolic components that facilitate planning and transfer. This work advances symbolic abstraction and planning in continual RL and points to future extensions to continuous actions and optimality objectives in more complex settings.

Abstract

Abstraction is key to scaling up reinforcement learning (RL). However, autonomously learning abstract state and action representations to enable transfer and generalization remains a challenging open problem. This paper presents a novel approach for inventing, representing, and utilizing options, which represent temporally extended behaviors, in continual RL settings. Our approach addresses streams of stochastic problems characterized by long horizons, sparse rewards, and unknown transition and reward functions. Our approach continually learns and maintains an interpretable state abstraction, and uses it to invent high-level options with abstract symbolic representations. These options meet three key desiderata: (1) composability for solving tasks effectively with lookahead planning, (2) reusability across problem instances for minimizing the need for relearning, and (3) mutual independence for reducing interference among options. Our main contributions are approaches for continually learning transferable, generalizable options with symbolic representations, and for integrating search techniques with RL to efficiently plan over these learned options to solve new problems. Empirical results demonstrate that the resulting approach effectively learns and transfers abstract knowledge across problem instances, achieving superior sample efficiency compared to state-of-the-art methods.

Paper Structure

This paper contains 24 sections, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Overall approach for Continual Hierarchical Reinforcement Learning and Planning (CHiRP).
  • Figure 2: Illustration of a Conditional Abstraction Tree (CAT) (left) and Abstract Options (right) for a small instance in taxi world. Left: Nodes show values of refined variables; other variables inherit values from parent nodes. Right: Arrows denote option policies. Abstract states are highlighted with solid red lines in both figures.
  • Figure 3: Illustration of two Context-Specific CATs (C-CATs) highlighting different active abstractions (represented by leaves) in the CAT from Fig. \ref{['fig:cat_to_options']}. The left C-CAT corresponds to $p=0$, while the right C-CAT corresponds to $p=1$.
  • Figure 4: Fraction of tasks solved vs training steps, averaged over 10 independent trials. Each approach was evaluated on a sequence of 20 randomly sampled tasks in a continual learning setting, with a fixed budget of timesteps to solve each task. The timesteps include all environment interactions used for learning both abstractions and policies.

Theorems & Definitions (6)

  • Definition 2.1: Continual Reinforcement Learning (CRL)
  • Definition 2.2: Abstract State
  • Definition 2.3: Conditional Abstraction Trees (CATs)
  • Definition 3.1: Context-specific CATs (C-CATs)
  • Definition 3.2: Context-specific distance between C-CATs
  • Definition 3.3: Context-independent distance between abstract states