Table of Contents
Fetching ...

Learning Abstract World Model for Value-preserving Planning with Options

Rafael Rodriguez-Sanchez, George Konidaris

TL;DR

The paper addresses the challenge of enabling general-purpose agents to plan effectively using rich sensorimotor data by learning dynamics-preserving abstract MDPs from a given set of temporally-extended actions. It introduces a theory-grounded framework that defines ground and abstract MDPs, grounding, and a dynamics-preserving abstraction $\phi$ to ensure trajectory simulations in the abstract model yield the same value as the ground MDP. The authors propose an information-maximization objective and a contrastive learning approach (InfoNCE) to learn $\phi$ and the abstract model, along with explicit losses for initiation, transition, reward, and duration; planning with the abstract model is applied to goal-based tasks. Empirical results in Pinball and Antmaze show the learned abstract state space captures task-relevant information, improves planning efficiency, and achieves competitive or superior performance with fewer real-environment samples compared to ground-model baselines and some Dreamer variants. Overall, the work provides a principled, reusable pathway to build continuous, high-level world models that enable efficient planning with temporally-extended skills in complex observation spaces.

Abstract

General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.

Learning Abstract World Model for Value-preserving Planning with Options

TL;DR

The paper addresses the challenge of enabling general-purpose agents to plan effectively using rich sensorimotor data by learning dynamics-preserving abstract MDPs from a given set of temporally-extended actions. It introduces a theory-grounded framework that defines ground and abstract MDPs, grounding, and a dynamics-preserving abstraction to ensure trajectory simulations in the abstract model yield the same value as the ground MDP. The authors propose an information-maximization objective and a contrastive learning approach (InfoNCE) to learn and the abstract model, along with explicit losses for initiation, transition, reward, and duration; planning with the abstract model is applied to goal-based tasks. Empirical results in Pinball and Antmaze show the learned abstract state space captures task-relevant information, improves planning efficiency, and achieves competitive or superior performance with fewer real-environment samples compared to ground-model baselines and some Dreamer variants. Overall, the work provides a principled, reusable pathway to build continuous, high-level world models that enable efficient planning with temporally-extended skills in complex observation spaces.

Abstract

General-purpose agents require fine-grained controls and rich sensory inputs to perform a wide range of tasks. However, this complexity often leads to intractable decision-making. Traditionally, agents are provided with task-specific action and observation spaces to mitigate this challenge, but this reduces autonomy. Instead, agents must be capable of building state-action spaces at the correct abstraction level from their sensorimotor experiences. We leverage the structure of a given set of temporally-extended actions to learn abstract Markov decision processes (MDPs) that operate at a higher level of temporal and state granularity. We characterize state abstractions necessary to ensure that planning with these skills, by simulating trajectories in the abstract MDP, results in policies with bounded value loss in the original MDP. We evaluate our approach in goal-based navigation environments that require continuous abstract states to plan successfully and show that abstract model learning improves the sample efficiency of planning and learning.
Paper Structure (32 sections, 8 theorems, 35 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 8 theorems, 35 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Theorem 3.6

Let the tuple $(M, \bar{M}, G)$ be a grounded abstract model and a function $\phi: \mathcal{S}\rightarrow \mathcal{Z}\subseteq\mathbb{R}^{d_z}$. The model satisfies that $B_t(\cdot \mid o_0,..., o_{t-1}) = \bar{B}_t(\cdot \mid o_0,..., o_{t-1})$ if and only if $\phi$ is dynamics-preserving.

Figures (10)

  • Figure 1: An agent needs to solve a task using its actuators and sensors (on the right). However, it requires an abstract model of the task (on the left) to reason at long time scales. This can be constructed by combining temporally-extended actions $\bar{a}$ with a compatible abstract state representation $\bar{s}$ that contains the minimal information necessary for planning with those actions.
  • Figure 2: Medium Antmaze. 2D MDS projection of the learned $\phi$: it learns to represent the position in the maze. The average grounding shows possible configurations of the ant joints when it is in the represented position.
  • Figure 3: MI matrix: ground features $s$ are in the vertical axis and abstract features $z$ are in the horizontal axis. High MI (first two rows) corresponds to the position of the ball or the ant.
  • Figure 4: Planning with an abstract model. Success rate v. Environment steps averaged over goals and $5$ seeds. The gray area represents the offset for the steps needed to pre-train the model.
  • Figure 5: Pinball from pixels. Ground baseline vs Abstract planning. Each goal learning curve is averaged over $5$ seeds and $1$ standard deviation shown in the shaded area of each curve. The gray area corresponds to the offset that corresponds to samples used to pre-train the model. Although is shown in every plot, it is common to all goals.
  • ...and 5 more figures

Theorems & Definitions (20)

  • Definition 3.1: Ground MDP
  • Definition 3.2: Abstract MDP
  • Definition 3.3: Grounding function
  • Definition 3.4: Future State Distribution
  • Definition 3.5: Dynamics Preserving Abstraction
  • Theorem 3.6
  • Corollary 3.7
  • proof
  • Corollary 3.8
  • Theorem 4.1: Value Loss Bound
  • ...and 10 more