The Laplacian in RL: Learning Representations with Efficient Approximations
Yifan Wu, George Tucker, Ofir Nachum
TL;DR
The paper tackles the challenge of learning geometry-aware state representations in reinforcement learning by proposing a scalable, model-free method to approximate Laplacian eigenfunctions via a stochastic graph-drawing objective. It defines a RL-compatible Laplacian using a stationary state distribution and a symmetric two-way transition density, and learns embeddings with a soft orthonormality penalty. Empirically, the approach outperforms prior tabular-based methods in representation quality and provides tangible benefits for reward shaping in both gridworld and continuous-control tasks. The work demonstrates the practical potential of Laplacian-based representations to improve learning speed and policy performance in complex RL environments.
Abstract
The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.
