PcLast: Discovering Plannable Continuous Latent States

Anurag Koul; Shivakanth Sujit; Shaoru Chen; Ben Evans; Lili Wu; Byron Xu; Rajan Chari; Riashat Islam; Raihan Seraj; Yonathan Efroni; Lekan Molu; Miro Dudik; John Langford; Alex Lamb

PcLast: Discovering Plannable Continuous Latent States

Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

TL;DR

PcLast addresses planning from high-dimensional observations by learning a plannable latent space that preserves reachability. It combines an ACRO-based endogenous-state extractor with a PCLaSt map $\\psi$ that enforces local neighborhood structure via a contrastive objective, complemented by a latent forward model $\\delta$ for dynamics. A hierarchical planning pipeline uses a graph of clustered latent states and Dijkstra search, with a low-level planner (CEM) to realize subgoals, enabling multi-level abstractions with $n$ levels. Across Maze2D, Sawyer-Reach, and exogenous-noise offline RL tasks, PcLast improves sampling efficiency and planning speed, and shows robustness to exogenous noise and nonlinear dynamics.

Abstract

Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.

PcLast: Discovering Plannable Continuous Latent States

TL;DR

PcLast addresses planning from high-dimensional observations by learning a plannable latent space that preserves reachability. It combines an ACRO-based endogenous-state extractor with a PCLaSt map

that enforces local neighborhood structure via a contrastive objective, complemented by a latent forward model

for dynamics. A hierarchical planning pipeline uses a graph of clustered latent states and Dijkstra search, with a low-level planner (CEM) to realize subgoals, enabling multi-level abstractions with

levels. Across Maze2D, Sawyer-Reach, and exogenous-noise offline RL tasks, PcLast improves sampling efficiency and planning speed, and shows robustness to exogenous noise and nonlinear dynamics.

Abstract

space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.

Paper Structure (27 sections, 2 theorems, 9 equations, 14 figures, 2 tables, 3 algorithms)

This paper contains 27 sections, 2 theorems, 9 equations, 14 figures, 2 tables, 3 algorithms.

Introduction
Related Work
PCLaSt: Discovery, Representation, and Planning
Notations and Preliminaries.
ACRO: Learning Endogenous State
Learning the PCLaSt map
Learning a latent forward model
Planning
Experiments
Environments
Impact of representation learning on goal-conditioned RL
Impact of PCLaSt on state abstraction
Multi-Level Abstraction and Hierarchical Planning
Exogenous-Noise Offline RL Experiments
PCLaSt Ablations
...and 12 more sections

Key Result

Proposition 0

Assume the tuple $(y,\bar{s},\bar{s}')$ is sampled via the CL generating process described above. Then, $\mathbb{P}_k(y=1 \mid \bar{s},\bar{s}') = \mathrm{sigmoid}(c-b ||\bar{s}-\bar{s}'||^2),$ where $\mathrm{sigmoid}(x)=\exp(x)/(\exp(x)+1).$

Figures (14)

Figure 1: Comparative view of clustering representations learned for a 2D maze environment with spiral walls (a). The agent's location is marked by black-dot in the maze image. The clustering of representations learned via ACRO (b) and PCLaSt (c) are overlaid on the maze image.
Figure 2: (a) Overview of the proposed method: (a) The encoder $\phi$, which maps observations $x$ to continuous latent states $\hat{s}$, is learned with a multi-step inverse model $f_{\mathrm{AC}}$ (left). A temporal contrastive objective ($\mathcal{L}_{m_-}$ and $\mathcal{L}_{m_+}$) is used to learn a metric space $\bar{s}$ (middle), a forward model ($\delta$) is learned in the latent space $\hat{s}$ (right). (b) High-level and low-level planners. The high-level planner generates coarse goals ($\hat{s}_y$) to be used as targets for low-level continuous planner. The dashed line indicates the expected trajectory after $\hat{s}_y$ is reached.
Figure 3: Environments: (a), (b) and (c) show different wall configurations of Maze2d environment for point-mass navigation task and (d) shows top-down view of robot-arm environment with the task of reaching various goal positions in 2D-planar space.
Figure 4: Illustration of an observation for Cheetah-Run, where the controllable environment image (1) is placed along with exogenous noise images (2-4) in a $4 \times 4$ grid. Numbers on images are for reference only. This grid of images is given as input to agent.
Figure 5: Clustering, Abstract-MDP, and Planning are shown for Maze-Hallway environment. In (a) and (b), we show $k$-means ($k=16$) clustering of latent states learned by PCLaSt and ACRO, respectively. In (c), we show the abstract transition model of the discrete states learned by PCLaSt (b) which captures the environment's topology. Finally, in (d), we show maze configuration and the executed trajectories of the agent from the initial location (black) to the target location (red) using n-Level ($n=2$) planner (blue) with PCLaSt and just low-level planner with ACRO (orange) and PCLaSt (green) representation for cost minimization .
...and 9 more figures

Theorems & Definitions (2)

Proposition 0
Proposition 0

PcLast: Discovering Plannable Continuous Latent States

TL;DR

Abstract

PcLast: Discovering Plannable Continuous Latent States

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (2)