Table of Contents
Fetching ...

A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

Keru Wang, Yixin Deng, Yao Lyu, Stephen Redmond, Shengbo Eben Li

Abstract

Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an $L^2$ geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.

A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

Abstract

Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.
Paper Structure (72 sections, 24 theorems, 178 equations, 1 figure)

This paper contains 72 sections, 24 theorems, 178 equations, 1 figure.

Key Result

Lemma 1

Let $P_1,P_2$ be probability distributions with $F_{P_1},F_{P_2}\in\Gamma_F$, and define Then for every $\omega\neq0$, where $\phi_{P}(\omega) = \mathbb E_{P}\!\left[e^{i\omega X}\right]$ denotes the characteristic function of $P$.

Figures (1)

  • Figure 1: Conceptual roadmap of the paper. Starting from distributional Bellman dynamics, we identify a structural obstruction arising from the singular nature of the Cramér geometry, which prevents a direct stable Hilbert space representation. Our analysis therefore proceeds in two stages. First, we study the Bellman dynamics intrinsically at the level of CDFs, where contraction and fixed-point properties can be established under the Cramér geometry. Second, we construct a regularised spectral representation that realises these dynamics within a Hilbert space framework while preserving the underlying Bellman operator. The regularisation introduces a stable family of Hilbert geometries whose singular limit recovers the intrinsic Cramér metric.

Theorems & Definitions (34)

  • Lemma 1: Fourier transform of a CDF difference
  • Proposition 2
  • Remark 3: Regularisation by geometry, not embedding
  • Proposition 4: Spectral bridge to the Cramér metric
  • Lemma 5: Membership of centered CDFs
  • Theorem 6: CDF-spectral isometric isomorphism
  • Remark 7: Real-linearity
  • Proposition 8: Canonical raw-to-spectral transport
  • Remark 9
  • Lemma 13: Invariance under reward translation
  • ...and 24 more