Spectral Bellman Method: Unifying Representation and Exploration in RL
Ofir Nabati, Bo Dai, Shie Mannor, Guy Tennenholtz
TL;DR
The Spectral Bellman Method (SBM) addresses the joint challenge of representation learning and exploration in value-based RL by exploiting a spectral relationship that emerges under the zero Inherent Bellman Error (IBE) condition. By tying Bellman updates to feature covariance through an augmented operator, SBM derives a practical, power-iteration–style objective that enforces Bellman-closure across a distribution of value functions and enables Thompson Sampling–based exploration. The method is instantiated in Q-learning with a dedicated representation learner, extended to multi-step operators, and demonstrated to improve performance on hard-exploration Atari games while preserving stable optimization. This approach provides a principled, scalable way to learn Bellman-aligned features that jointly improve value approximation and data-efficient exploration, with potential applicability to broader RL settings beyond Atari.
Abstract
Representation learning is critical to the empirical and theoretical success of reinforcement learning. However, many existing methods are induced from model-learning aspects, misaligning them with the RL task in hand. This work introduces the Spectral Bellman Method, a novel framework derived from the Inherent Bellman Error (IBE) condition. It aligns representation learning with the fundamental structure of Bellman updates across a \textit{space} of possible value functions, making it directly suited for value-based RL. Our key insight is a fundamental spectral relationship: under the zero-IBE condition, the transformation of a \textit{distribution} of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This connection yields a new, theoretically-grounded objective for learning state-action features that capture this Bellman-aligned covariance, requiring only a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration by aligning feature covariance with Bellman dynamics, improving performance in hard-exploration and long-horizon tasks. Our framework naturally extends to multi-step Bellman operators, offering a principled path toward learning more powerful and structurally sound representations for value-based RL.
