Proper Laplacian Representation Learning

Diego Gomez; Michael Bowling; Marlos C. Machado

Proper Laplacian Representation Learning

Diego Gomez, Michael Bowling, Marlos C. Machado

TL;DR

The paper tackles learning robust Laplacian-based state representations for large-scale reinforcement learning. It introduces ALLO, a max–min objective with stop-gradient and augmented Lagrangian dynamics that yields a unique, stable equilibrium corresponding to the true Laplacian eigenvectors and eigenvalues, removing hyperparameter sensitivity seen in prior GDO/GGDO methods. Theoretical results show equilibria align with ordered eigenpairs and stability is ensured for barrier values $b>2$, while empirical results across grid environments demonstrate accurate eigenvector recovery and improved eigenvalue estimates compared to baselines. This work enables reliable intrinsic rewards and temporally-extended action discovery in complex environments, with practical impact for exploration, generalization, and transfer in RL.

Abstract

The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing informative state encoding and intrinsic rewards for temporally-extended action discovery and reward shaping. To obtain the Laplacian representation one needs to compute the eigensystem of the graph Laplacian, which is often approximated through optimization objectives compatible with deep learning approaches. These approximations, however, depend on hyperparameters that are impossible to tune efficiently, converge to arbitrary rotations of the desired eigenvectors, and are unable to accurately recover the corresponding eigenvalues. In this paper we introduce a theoretically sound objective and corresponding optimization algorithm for approximating the Laplacian representation. Our approach naturally recovers both the true eigenvectors and eigenvalues while eliminating the hyperparameter dependence of previous approximations. We provide theoretical guarantees for our method and we show that those results translate empirically into robust learning across multiple environments.

Proper Laplacian Representation Learning

TL;DR

, while empirical results across grid environments demonstrate accurate eigenvector recovery and improved eigenvalue estimates compared to baselines. This work enables reliable intrinsic rewards and temporally-extended action discovery in complex environments, with practical impact for exploration, generalization, and transfer in RL.

Abstract

Paper Structure (32 sections, 5 theorems, 38 equations, 10 figures, 2 tables)

This paper contains 32 sections, 5 theorems, 38 equations, 10 figures, 2 tables.

Introduction
Background
Reinforcement Learning.
Laplacian Representation.
The Graph Drawing Objective.
The Generalized Graph Drawing Objective.
The Abstract and Approximate Settings.
Augmented Lagrangian Laplacian Objective
Asymmetric Constraints as a Generalized Graph Drawing Alternative.
Augmented Lagrangian Dynamics for Exact Learning.
Barrier Dynamics.
Theoretical results
Experiments
Eigenvector Accuracy.
Eigenvalue Accuracy.
...and 17 more sections

Key Result

Lemma 1

Consider a symmetric matrix $\mathbf{L}\in\mathbb{R}^{|\mathcal{S}|\times|\mathcal{S}|}$ with increasing, and possibly repeated, eigenvalues $\lambda_1\le\cdots\le\lambda_{|\mathcal{S}|}$, and a corresponding sequence of eigenvectors $(\mathbf{e}_i)_{i=1}^{|\mathcal{S}|}$. Then, given a number of co

Figures (10)

Figure 1: Average cosine similarity between the true Laplacian representation and GGDO for different values of the barrier penalty coefficient, averaged over 60 seeds, with the best coefficient highlighted. The shaded region corresponds to a 95% confidence interval.
Figure 2: Grid environments. Color is the 2nd smallest Laplacian eigenvector learned by ALLO.
Figure 3: Average cosine similarity between the true Laplacian and ALLO for different initial values of the barrier coefficient $b$, averaged over 60 seeds, with the best coefficient highlighted. The shaded region corresponds to a 95% confidence interval.
Figure 4: Difference of cosine similarities when approximating eigenvectors (A), and of relative errors for eigenvalues (B). Error bars show the standard deviation of the differences. GR and GM stand for GridRoom and GridMaze. black bars correspond to p-values below $0.01$.
Figure 5: Average cosine similarity for different objectives in the environment GridMaze-19, for initial barrier coefficient $b=0.1$, and for different barrier increase rates $\alpha_{\text{barrier}}$.
...and 5 more figures

Theorems & Definitions (11)

Lemma 1
Lemma 2
proof
Corollary 1
Theorem 1
proof : Proof Sketch
proof
Proposition 1
proof
proof
...and 1 more

Proper Laplacian Representation Learning

TL;DR

Abstract

Proper Laplacian Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)