Table of Contents
Fetching ...

Laplacian Representations for Decision-Time Planning

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado

TL;DR

The paper tackles planning with learned models in offline goal-conditioned RL by introducing the Laplacian representation as a multi-time-scale latent space. It presents ALPS, a hierarchical decision-time planner that uses the Laplacian-based psi-space to identify subgoals via spectral clustering and to estimate state distances for planning, while employing a behavior-prior-guided CEM at the low level. The method is validated on Maze2D and large-scale OGBench tasks, where ALPS outperforms model-free baselines and shows robust ablation results, highlighting the importance of the high-level planner, behavior prior, and subgoal partitioning. The work establishes a CTD-based theoretical link for planning in psi-space and demonstrates that spectral geometry can enable scalable, long-horizon planning with learned dynamics in complex environments.

Abstract

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

Laplacian Representations for Decision-Time Planning

TL;DR

The paper tackles planning with learned models in offline goal-conditioned RL by introducing the Laplacian representation as a multi-time-scale latent space. It presents ALPS, a hierarchical decision-time planner that uses the Laplacian-based psi-space to identify subgoals via spectral clustering and to estimate state distances for planning, while employing a behavior-prior-guided CEM at the low level. The method is validated on Maze2D and large-scale OGBench tasks, where ALPS outperforms model-free baselines and shows robust ablation results, highlighting the importance of the high-level planner, behavior prior, and subgoal partitioning. The work establishes a CTD-based theoretical link for planning in psi-space and demonstrates that spectral geometry can enable scalable, long-horizon planning with learned dynamics in complex environments.

Abstract

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.
Paper Structure (34 sections, 12 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 34 sections, 12 equations, 7 figures, 12 tables, 2 algorithms.

Figures (7)

  • Figure 1: Visualization of the $\psi$-space properties in the pointmaze-medium maze environment from OGBench. (left) Heatmap of $c(s^\star, s_i)$ distance from a reference state ($s^\star$ denoted by $\star$ in the figure) to each state in the dataset. (right) Cluster labels assigned to each state in the dataset via clustering in $\psi$-space.
  • Figure 2: ALPS at the pre-training and decision-time planning phases. In pre-training, ALPS (1) learns the Laplacian representation using ALLO, (2) learns a one-step forward model on-top of the original state space $\mathscr{S}$, (3) learns the behavior prior $\pi_\text{prior}$ using the scaled Laplacian representation, and (4) clusters the dataset using $k$-means in the scaled Laplacian space to generate the cluster graph. In planning, (5) ALPS takes the current state from the environment and uses the high-level planner to determine the next subgoal, and then determines the next action towards this subgoal using the low-level planner.
  • Figure 3: Success rate vs. number of clusters on OGBench large maze for ant task and types (navigate, stitch, explore). Error bars indicate standard error.
  • Figure 4: (a)-(c) 2-D Maze environments with continuous state and action spaces, (d) Input observation
  • Figure 5: OGBench Mazes
  • ...and 2 more figures