PcLast: Discovering Plannable Continuous Latent States
Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb
TL;DR
PcLast addresses planning from high-dimensional observations by learning a plannable latent space that preserves reachability. It combines an ACRO-based endogenous-state extractor with a PCLaSt map $\\psi$ that enforces local neighborhood structure via a contrastive objective, complemented by a latent forward model $\\delta$ for dynamics. A hierarchical planning pipeline uses a graph of clustered latent states and Dijkstra search, with a low-level planner (CEM) to realize subgoals, enabling multi-level abstractions with $n$ levels. Across Maze2D, Sawyer-Reach, and exogenous-noise offline RL tasks, PcLast improves sampling efficiency and planning speed, and shows robustness to exogenous noise and nonlinear dynamics.
Abstract
Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.
