Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu; Ayush Sekhari; Akshay Krishnamurthy; Wen Sun

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy, Wen Sun

TL;DR

This work addresses computationally efficient online reinforcement learning under linear Bellman completeness with deterministic dynamics. It introduces a randomized least-squares approach where exploration noise is confined to the null space of the data and paired with a span-based analysis to bound regret, enabling learning in large action spaces and under stochastic rewards/initial states. The authors prove regret bounds of the form $ ilde{O}(d^{5/2}H^{5/2} + d^2H^{3/2}\sqrt{T})$ under exact or approximate square-loss oracles and extend to scenarios with low inherent Bellman error, detailing oracle-implementation strategies via convex-set feasibility and linear optimization. This work narrows the statistical-computational gap for linear Bellman complete RL and furnishes practical algorithms grounded in convex optimization with provable guarantees, while leaving extensions to stochastic dynamics as an open problem.

Abstract

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting. This setting uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR). While it is known from the prior works that this setting is statistically tractable, it remained open whether a computationally efficient algorithm exists. Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic. Our approach is based on randomization: we inject random noise into least squares regression problems to perform optimistic value iteration. Our key technical contribution is to carefully design the noise to only act in the null space of the training data to ensure optimism while circumventing a subtle error amplification issue.

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

TL;DR

under exact or approximate square-loss oracles and extend to scenarios with low inherent Bellman error, detailing oracle-implementation strategies via convex-set feasibility and linear optimization. This work narrows the statistical-computational gap for linear Bellman complete RL and furnishes practical algorithms grounded in convex optimization with provable guarantees, while leaving extensions to stochastic dynamics as an open problem.

Abstract

Paper Structure (30 sections, 35 theorems, 141 equations, 1 table, 5 algorithms)

This paper contains 30 sections, 35 theorems, 141 equations, 1 table, 5 algorithms.

Introduction
Related Works
Preliminaries
Other Linear Bellman Completeness Definitions in the Literature
Other Prior Works on Linear Bellman Completeness
Algorithm
Analysis
Prelude: Learning with Exact Square Loss Minimization Oracle
Learning with Approximate Square Loss Minimization Oracle
Learning with Low Inherent Linear Bellman Error
Opening the Black-Box: Implementing Squared Loss Minimization Oracles in alg:main
Computationally Efficient Convex Set Feasibility
Computationally Efficient Estimation of Value Function (Eqn eq:theta_optimization)
Conclusion
Table of Notation
...and 15 more sections

Key Result

Theorem 1

Under asm:exact-oracleasm:determin, executing alg:main with parameters $\sigma_{{\rm{R}}} = \widetilde{\Theta}(\sqrt{d H \log(HT)})$ and $\sigma_{h} = \widetilde{\Theta}( (d\sqrt{mH})^{H-h+1}(\sqrt{d} + \sqrt{mH}) )$, we have

Theorems & Definitions (67)

Definition 1: Linear Bellman Completeness
Example 1: Arbitrarily Large $\ell_2$-norm on Parameters
Example 2: Expansiveness of Bellman Backup in $\ell_2$-norm
Definition 2: D-optimal design
Theorem 1: Regret Bound with Exact Oracle
Corollary 1: Sample Complexity Bound
Theorem 2: Regret Bound with Approximate Oracle
Definition 3: Inherent Linear Bellman Error
Theorem 3: Regret Bound with Low Inherent Bellman Error
Definition 4: Separation oracle
...and 57 more

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

TL;DR

Abstract

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (67)