Table of Contents
Fetching ...

Efficient Planning in Reinforcement Learning via Model Introspection

Gabriel Stella

TL;DR

The paper addresses the gap between reinforcement learning and classical planning by introducing model introspection, a program-analytic approach that leverages learned models to guide planning. It defines milestones as rewards-maximizing future states and develops a bilevel planner, introspector, that uses state mutations to generate milestones and a domain-agnostic heuristic to enable efficient search in relational domains. Empirical results in relational domains (Blocks-World, drawers, bins) show sub-exponential runtime growth compared with baseline planners, demonstrating substantial speedups. The work also discusses representations, interpretability, and future directions for extending introspection to other model classes and to derive optimal solvers. Overall, this framework provides a principled bridge between RL and classical planning via introspective analysis of internal models.

Abstract

Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.

Efficient Planning in Reinforcement Learning via Model Introspection

TL;DR

The paper addresses the gap between reinforcement learning and classical planning by introducing model introspection, a program-analytic approach that leverages learned models to guide planning. It defines milestones as rewards-maximizing future states and develops a bilevel planner, introspector, that uses state mutations to generate milestones and a domain-agnostic heuristic to enable efficient search in relational domains. Empirical results in relational domains (Blocks-World, drawers, bins) show sub-exponential runtime growth compared with baseline planners, demonstrating substantial speedups. The work also discusses representations, interpretability, and future directions for extending introspection to other model classes and to derive optimal solvers. Overall, this framework provides a principled bridge between RL and classical planning via introspective analysis of internal models.

Abstract

Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.
Paper Structure (33 sections, 7 equations, 18 figures)

This paper contains 33 sections, 7 equations, 18 figures.

Figures (18)

  • Figure 1: Example maze with milestone states marked in the corresponding state space graph. While existing approaches to model-based reinforcement learning search the state space without long-term guidance, our reward-based milestones allow the agent to plan more efficiently using informed search algorithms. Note that these milestones are not given as domain knowledge; instead, they are computed by the agent based on information contained within its learned models.
  • Figure 2: Planning results in the grid search domain
  • Figure 3: Definition of the Blocks-World domain for relational reinforcement learning
  • Figure 4: Example Blocks-World state and milestone
  • Figure 5: Planning results in relational domains
  • ...and 13 more figures