Laplacian Representations for Decision-Time Planning

Dikshant Shehmar; Matthew Schlegel; Matthew E. Taylor; Marlos C. Machado

Laplacian Representations for Decision-Time Planning

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado

TL;DR

The paper tackles planning with learned models in offline goal-conditioned RL by introducing the Laplacian representation as a multi-time-scale latent space. It presents ALPS, a hierarchical decision-time planner that uses the Laplacian-based psi-space to identify subgoals via spectral clustering and to estimate state distances for planning, while employing a behavior-prior-guided CEM at the low level. The method is validated on Maze2D and large-scale OGBench tasks, where ALPS outperforms model-free baselines and shows robust ablation results, highlighting the importance of the high-level planner, behavior prior, and subgoal partitioning. The work establishes a CTD-based theoretical link for planning in psi-space and demonstrates that spectral geometry can enable scalable, long-horizon planning with learned dynamics in complex environments.

Abstract

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

Laplacian Representations for Decision-Time Planning

TL;DR

Abstract

Paper Structure (34 sections, 12 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 34 sections, 12 equations, 7 figures, 12 tables, 2 algorithms.

Introduction
Preliminaries
Problem Setting
Laplacian Representation in RL
Planning with a learned model
The Laplacian for Long-Horizon Planning
Augmented Laplacian Planning with Subgoals
Experiments
Methodology
Comparison with PcLast
Scaling to large state and action spaces
Ablation Studies
Components of ALPS
Number of Clusters
Related Work
...and 19 more sections

Figures (7)

Figure 1: Visualization of the $\psi$-space properties in the pointmaze-medium maze environment from OGBench. (left) Heatmap of $c(s^\star, s_i)$ distance from a reference state ($s^\star$ denoted by $\star$ in the figure) to each state in the dataset. (right) Cluster labels assigned to each state in the dataset via clustering in $\psi$-space.
Figure 2: ALPS at the pre-training and decision-time planning phases. In pre-training, ALPS (1) learns the Laplacian representation using ALLO, (2) learns a one-step forward model on-top of the original state space $\mathscr{S}$, (3) learns the behavior prior $\pi_\text{prior}$ using the scaled Laplacian representation, and (4) clusters the dataset using $k$-means in the scaled Laplacian space to generate the cluster graph. In planning, (5) ALPS takes the current state from the environment and uses the high-level planner to determine the next subgoal, and then determines the next action towards this subgoal using the low-level planner.
Figure 3: Success rate vs. number of clusters on OGBench large maze for ant task and types (navigate, stitch, explore). Error bars indicate standard error.
Figure 4: (a)-(c) 2-D Maze environments with continuous state and action spaces, (d) Input observation
Figure 5: OGBench Mazes
...and 2 more figures

Laplacian Representations for Decision-Time Planning

TL;DR

Abstract

Laplacian Representations for Decision-Time Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)