Table of Contents
Fetching ...

The Laplacian in RL: Learning Representations with Efficient Approximations

Yifan Wu, George Tucker, Ofir Nachum

TL;DR

The paper tackles the challenge of learning geometry-aware state representations in reinforcement learning by proposing a scalable, model-free method to approximate Laplacian eigenfunctions via a stochastic graph-drawing objective. It defines a RL-compatible Laplacian using a stationary state distribution and a symmetric two-way transition density, and learns embeddings with a soft orthonormality penalty. Empirically, the approach outperforms prior tabular-based methods in representation quality and provides tangible benefits for reward shaping in both gridworld and continuous-control tasks. The work demonstrates the practical potential of Laplacian-based representations to improve learning speed and policy performance in complex RL environments.

Abstract

The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.

The Laplacian in RL: Learning Representations with Efficient Approximations

TL;DR

The paper tackles the challenge of learning geometry-aware state representations in reinforcement learning by proposing a scalable, model-free method to approximate Laplacian eigenfunctions via a stochastic graph-drawing objective. It defines a RL-compatible Laplacian using a stationary state distribution and a symmetric two-way transition density, and learns embeddings with a soft orthonormality penalty. Empirically, the approach outperforms prior tabular-based methods in representation quality and provides tangible benefits for reward shaping in both gridworld and continuous-control tasks. The work demonstrates the practical potential of Laplacian-based representations to improve learning speed and policy performance in complex RL environments.

Abstract

The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.

Paper Structure

This paper contains 23 sections, 17 equations, 8 figures.

Figures (8)

  • Figure 1: Visualization of the shaped reward defined by the L2 distance from the red cell on an $(x,y)$ representation (left) and Laplacian representation (right).
  • Figure 2: FourRoom Env.
  • Figure 3: Evaluation of learned representations. The x-axis shows number of transitions used for training and y-axis shows the gap between the graph drawing objective of the learned representations and the optimal Laplacian-based representations (lower is better). We find our method (graph drawing) more accurately approximates the desired representations than previous methods. See Appendix \ref{['app:exp']} for details and additional results.
  • Figure 4: Results of reward shaping with a learned Laplacian embedding in GridWorld environments. The top row shows the L2 distance in the learned embedding space. The bottom row shows empirical performance. Our method (mix) can reach optimal performance faster than the baselines, especially in harder mazes. Policies are trained by DQN.
  • Figure 5: Results of reward shaping with a learned Laplacian embedding in continuous control environments. Our learned representations are used by the "mix" and "fullmix" variants (see text for details), whose performance dominates that of all other methods. Policies are trained by DDPG.
  • ...and 3 more figures