Stochastic Halpern iteration in normed spaces and applications to reinforcement learning
Mario Bravo, Juan Pablo Contreras
TL;DR
The paper analyzes stochastic Halpern iterations for fixed-point problems in finite-dimensional normed spaces, focusing on nonexpansive and contractive operators with a stochastic oracle and minibatching. It derives explicit nonasymptotic oracle complexity bounds: for nonexpansive maps, a near-optimal rate of $\tilde{O}(\varepsilon^{-5})$ under bounded variance, along with a fundamental $\Omega(\varepsilon^{-3})$ lower bound; for contractive maps, a faster $O(\varepsilon^{-2}(1-\gamma)^{-3})$ rate. A key technical contribution is the minibatch variance-reduction scheme combined with a recursive bound analysis, allowing explicit constants in the non-Euclidean norm setting. The framework is then applied to model-free reinforcement learning, yielding Halpern-based Q-learning algorithms for average-reward and discounted MDPs without prior knowledge of problem parameters, with provable Bellman-error and policy-error guarantees. These results advance stochastic fixed-point theory in general normed spaces and provide practical, parameter-free model-free RL procedures with theoretical guarantees.
Abstract
We analyze the oracle complexity of the stochastic Halpern iteration with minibatch, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle has uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, to obtain $\varepsilon$ expected fixed-point residual for nonexpansive operators, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $Ω(\varepsilon^{-3})$ which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-γ)^{-3})$ complexity bound in the case in which the operator is a $γ$-contraction to obtain an approximation of the fixed-point. As an application, we propose new model-free algorithms for average and discounted reward MDPs. For the average reward case, our method applies to weakly communicating MDPs without requiring prior parameter knowledge.
