Efficient Planning in Reinforcement Learning via Model Introspection
Gabriel Stella
TL;DR
The paper addresses the gap between reinforcement learning and classical planning by introducing model introspection, a program-analytic approach that leverages learned models to guide planning. It defines milestones as rewards-maximizing future states and develops a bilevel planner, introspector, that uses state mutations to generate milestones and a domain-agnostic heuristic to enable efficient search in relational domains. Empirical results in relational domains (Blocks-World, drawers, bins) show sub-exponential runtime growth compared with baseline planners, demonstrating substantial speedups. The work also discusses representations, interpretability, and future directions for extending introspection to other model classes and to derive optimal solvers. Overall, this framework provides a principled bridge between RL and classical planning via introspective analysis of internal models.
Abstract
Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.
