Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Ankur Naskar; Gugan Thoppe; Vijay Gupta

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Ankur Naskar, Gugan Thoppe, Vijay Gupta

Abstract

Algorithms for solving \textit{nonlinear} fixed-point equations -- such as average-reward \textit{$Q$-learning} and \textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak--Ruppert averaging has remained elusive, largely due to the non-monotonicity of such semi-norms. We close this gap by (i.) recasting the averaged error as a linear recursion involving a nonlinear perturbation, and (ii.) taming the nonlinearity by coupling the semi-norm's contraction with the monotonicity of a suitably induced norm. Our main result yields the first parameter-free $\tilde{O}(1/\sqrt{t})$ optimal rates for $Q$-learning in both average-reward and exponentially discounted settings, where $t$ denotes the iteration index. The result applies within a broad framework that accommodates synchronous and asynchronous updates, single-agent and distributed deployments, and data streams obtained either from simulators or along Markovian trajectories.

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Abstract

Algorithms for solving \textit{nonlinear} fixed-point equations -- such as average-reward \textit{

-learning} and \textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak--Ruppert averaging has remained elusive, largely due to the non-monotonicity of such semi-norms. We close this gap by (i.) recasting the averaged error as a linear recursion involving a nonlinear perturbation, and (ii.) taming the nonlinearity by coupling the semi-norm's contraction with the monotonicity of a suitably induced norm. Our main result yields the first parameter-free

optimal rates for

-learning in both average-reward and exponentially discounted settings, where

denotes the iteration index. The result applies within a broad framework that accommodates synchronous and asynchronous updates, single-agent and distributed deployments, and data streams obtained either from simulators or along Markovian trajectories.

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Abstract

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (37)