Efficient Planning in Reinforcement Learning via Model Introspection

Gabriel Stella

Efficient Planning in Reinforcement Learning via Model Introspection

Gabriel Stella

TL;DR

The paper addresses the gap between reinforcement learning and classical planning by introducing model introspection, a program-analytic approach that leverages learned models to guide planning. It defines milestones as rewards-maximizing future states and develops a bilevel planner, introspector, that uses state mutations to generate milestones and a domain-agnostic heuristic to enable efficient search in relational domains. Empirical results in relational domains (Blocks-World, drawers, bins) show sub-exponential runtime growth compared with baseline planners, demonstrating substantial speedups. The work also discusses representations, interpretability, and future directions for extending introspection to other model classes and to derive optimal solvers. Overall, this framework provides a principled bridge between RL and classical planning via introspective analysis of internal models.

Abstract

Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.

Efficient Planning in Reinforcement Learning via Model Introspection

TL;DR

Abstract

Paper Structure (33 sections, 7 equations, 18 figures)

This paper contains 33 sections, 7 equations, 18 figures.

Introduction
Background
Reinforcement Learning
Classical Planning
Planning in Reinforcement Learning
Goal-Conditioned Reinforcement Learning
Reward Machines
Generalization
Milestones
State and Model Representations
Example: Search on a Grid
Planning in Relational Domains
Example Environment
Milestone Computation
Literal
...and 18 more sections

Figures (18)

Figure 1: Example maze with milestone states marked in the corresponding state space graph. While existing approaches to model-based reinforcement learning search the state space without long-term guidance, our reward-based milestones allow the agent to plan more efficiently using informed search algorithms. Note that these milestones are not given as domain knowledge; instead, they are computed by the agent based on information contained within its learned models.
Figure 2: Planning results in the grid search domain
Figure 3: Definition of the Blocks-World domain for relational reinforcement learning
Figure 4: Example Blocks-World state and milestone
Figure 5: Planning results in relational domains
...and 13 more figures

Efficient Planning in Reinforcement Learning via Model Introspection

TL;DR

Abstract

Efficient Planning in Reinforcement Learning via Model Introspection

Authors

TL;DR

Abstract

Table of Contents

Figures (18)