The Laplacian Keyboard: Beyond the Linear Span
Siddarth Chandrasekar, Marlos C. Machado
TL;DR
The paper introduces the Laplacian Keyboard (LK), a hierarchical RL framework that leverages graph Laplacian eigenvectors as a task-agnostic, low-frequency basis for reward approximation and behavior generation. By pre-training a Laplacian encoder and a Universal Successor Feature Approximator on reward-free data, LK constructs a continuous library of options; a meta-policy then stitches these options to solve downstream tasks, achieving zero-shot optimality for rewards in the basis span and improved sample efficiency beyond it. Theoretical bounds relate reward smoothness and basis dimensionality to value-function approximation error, and empirical results on DeepMind Control tasks show LK matches strong zero-shot baselines and surpasses flat RL in sample efficiency, while approaching privileged baselines like OKB without handcrafted features. Overall, LK provides a scalable behavioral foundation that integrates representation- and behavior-based approaches, with potential for online extensions and more flexible termination strategies in large, complex environments.
Abstract
Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL), these eigenvectors provide a natural basis for approximating reward functions; however, their use is typically limited to their linear span, which restricts expressivity in complex environments. We introduce the Laplacian Keyboard (LK), a hierarchical framework that goes beyond the linear span. LK constructs a task-agnostic library of options from these eigenvectors, forming a behavior basis guaranteed to contain the optimal policy for any reward within the linear span. A meta-policy learns to stitch these options dynamically, enabling efficient learning of policies outside the original linear constraints. We establish theoretical bounds on zero-shot approximation error and demonstrate empirically that LK surpasses zero-shot solutions while achieving improved sample efficiency compared to standard RL methods.
