Table of Contents
Fetching ...

Spectral Bellman Method: Unifying Representation and Exploration in RL

Ofir Nabati, Bo Dai, Shie Mannor, Guy Tennenholtz

TL;DR

The Spectral Bellman Method (SBM) addresses the joint challenge of representation learning and exploration in value-based RL by exploiting a spectral relationship that emerges under the zero Inherent Bellman Error (IBE) condition. By tying Bellman updates to feature covariance through an augmented operator, SBM derives a practical, power-iteration–style objective that enforces Bellman-closure across a distribution of value functions and enables Thompson Sampling–based exploration. The method is instantiated in Q-learning with a dedicated representation learner, extended to multi-step operators, and demonstrated to improve performance on hard-exploration Atari games while preserving stable optimization. This approach provides a principled, scalable way to learn Bellman-aligned features that jointly improve value approximation and data-efficient exploration, with potential applicability to broader RL settings beyond Atari.

Abstract

Representation learning is critical to the empirical and theoretical success of reinforcement learning. However, many existing methods are induced from model-learning aspects, misaligning them with the RL task in hand. This work introduces the Spectral Bellman Method, a novel framework derived from the Inherent Bellman Error (IBE) condition. It aligns representation learning with the fundamental structure of Bellman updates across a \textit{space} of possible value functions, making it directly suited for value-based RL. Our key insight is a fundamental spectral relationship: under the zero-IBE condition, the transformation of a \textit{distribution} of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This connection yields a new, theoretically-grounded objective for learning state-action features that capture this Bellman-aligned covariance, requiring only a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration by aligning feature covariance with Bellman dynamics, improving performance in hard-exploration and long-horizon tasks. Our framework naturally extends to multi-step Bellman operators, offering a principled path toward learning more powerful and structurally sound representations for value-based RL.

Spectral Bellman Method: Unifying Representation and Exploration in RL

TL;DR

The Spectral Bellman Method (SBM) addresses the joint challenge of representation learning and exploration in value-based RL by exploiting a spectral relationship that emerges under the zero Inherent Bellman Error (IBE) condition. By tying Bellman updates to feature covariance through an augmented operator, SBM derives a practical, power-iteration–style objective that enforces Bellman-closure across a distribution of value functions and enables Thompson Sampling–based exploration. The method is instantiated in Q-learning with a dedicated representation learner, extended to multi-step operators, and demonstrated to improve performance on hard-exploration Atari games while preserving stable optimization. This approach provides a principled, scalable way to learn Bellman-aligned features that jointly improve value approximation and data-efficient exploration, with potential applicability to broader RL settings beyond Atari.

Abstract

Representation learning is critical to the empirical and theoretical success of reinforcement learning. However, many existing methods are induced from model-learning aspects, misaligning them with the RL task in hand. This work introduces the Spectral Bellman Method, a novel framework derived from the Inherent Bellman Error (IBE) condition. It aligns representation learning with the fundamental structure of Bellman updates across a \textit{space} of possible value functions, making it directly suited for value-based RL. Our key insight is a fundamental spectral relationship: under the zero-IBE condition, the transformation of a \textit{distribution} of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This connection yields a new, theoretically-grounded objective for learning state-action features that capture this Bellman-aligned covariance, requiring only a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration by aligning feature covariance with Bellman dynamics, improving performance in hard-exploration and long-horizon tasks. Our framework naturally extends to multi-step Bellman operators, offering a principled path toward learning more powerful and structurally sound representations for value-based RL.

Paper Structure

This paper contains 28 sections, 4 theorems, 39 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

The following identities hold:

Figures (3)

  • Figure 1: Spectral Representation of the optimal Bellman operator. Under zero IBE condition, the linear representation is equivalent to an SVD decomposition of rank $d$ with a singular value matrix $\Sigma$.
  • Figure 2: Visualization of the parameter sampling distribution $\nu_t(\theta) = {\mathcal{N}}(\hat{\theta}_t, \sigma_{rep}^2I)$ for $d=2$ over successive rounds. As Q-learning updates the mean $\hat{\theta}_t$, $\nu(\theta)$ shifts, focusing representation learning on parameters relevant to the current policy. Darker regions indicate higher probability density.
  • Figure 3: Average HNS over 100M steps. DQN and R2D2 against their SBM counterparts with TS across Atari ALE benchmark (left) and on the hard-exploration subset (right).

Theorems & Definitions (11)

  • Definition 1: Function Space and Parameter Bounds
  • Definition 2: Inherent Bellman Error (IBE), zanette2020learning
  • Definition 3: Max Uncertainty zanette2020provably
  • Proposition 1
  • Proposition 2
  • Theorem 1: Bellman Operator Spectral Decomposition
  • proof : Proof
  • proof
  • proof
  • Proposition 3: $h$-step IBE Bound
  • ...and 1 more