Proper Laplacian Representation Learning
Diego Gomez, Michael Bowling, Marlos C. Machado
TL;DR
The paper tackles learning robust Laplacian-based state representations for large-scale reinforcement learning. It introduces ALLO, a max–min objective with stop-gradient and augmented Lagrangian dynamics that yields a unique, stable equilibrium corresponding to the true Laplacian eigenvectors and eigenvalues, removing hyperparameter sensitivity seen in prior GDO/GGDO methods. Theoretical results show equilibria align with ordered eigenpairs and stability is ensured for barrier values $b>2$, while empirical results across grid environments demonstrate accurate eigenvector recovery and improved eigenvalue estimates compared to baselines. This work enables reliable intrinsic rewards and temporally-extended action discovery in complex environments, with practical impact for exploration, generalization, and transfer in RL.
Abstract
The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing informative state encoding and intrinsic rewards for temporally-extended action discovery and reward shaping. To obtain the Laplacian representation one needs to compute the eigensystem of the graph Laplacian, which is often approximated through optimization objectives compatible with deep learning approaches. These approximations, however, depend on hyperparameters that are impossible to tune efficiently, converge to arbitrary rotations of the desired eigenvectors, and are unable to accurately recover the corresponding eigenvalues. In this paper we introduce a theoretically sound objective and corresponding optimization algorithm for approximating the Laplacian representation. Our approach naturally recovers both the true eigenvectors and eigenvalues while eliminating the hyperparameter dependence of previous approximations. We provide theoretical guarantees for our method and we show that those results translate empirically into robust learning across multiple environments.
