The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Dongyan Huo; Yixuan Zhang; Yudong Chen; Qiaomin Xie

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Dongyan Huo, Yixuan Zhang, Yudong Chen, Qiaomin Xie

TL;DR

This work addresses constant-stepsize stochastic approximation with nonlinear updates and Markovian data, a setting where memory and nonlinearity interact in complex ways. The authors develop a fine-grained analysis that yields the first weak convergence of the joint process and a precise asymptotic bias decomposition into Markovian, nonlinear, and interaction terms, plus higher-moment and CLT results for the averaged iterates. They prove both projected and projection-free weak convergence under distinct conditions and provide non-asymptotic rates, establishing practical implications for bias-robust inference via RR extrapolation. The results apply to GLMs with Markov data, including logistic and smooth-ReLU-type models, offering a rigorous foundation for reliable learning and inference in dependent-data nonlinear SA. Altogether, the paper advances understanding of SA under memory and nonlinearity, with concrete algorithmic recommendations for bias reduction and statistical testing.

Abstract

In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $θ_k$ and Markovian data $x_k$. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process $(x_k, θ_k)_{k\geq0}$. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by $\mathbb{E}[θ_\infty]-θ^\ast=α(b_\text{m}+b_\text{n}+b_\text{c})+O(α^{3/2})$. Here, $b_\text{m}$ is associated with the Markovian noise, $b_\text{n}$ is tied to the nonlinearity, and notably, $b_\text{c}$ represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment $\mathbb{E}[\|θ_k-θ^\ast\|^{2p}]$ and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

TL;DR

Abstract

In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize

. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates

and Markovian data

. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process

. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by

. Here,

is associated with the Markovian noise,

is tied to the nonlinearity, and notably,

represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment

and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.

Paper Structure (38 sections, 20 theorems, 299 equations)

This paper contains 38 sections, 20 theorems, 299 equations.

Introduction
Problem Setup and Preliminaries
Analytical Challenges and Techniques
Main Results
Weak Convergence of Projected SA
Weak Convergence without Projection
Non-Asymptotic Convergence Rate and Central Limit Theorem
Bias Characterization
Algorithmic Implications
Implications for Learning GLM
Related Work
Conclusion
Additional Notations
Proof of Pilot Results (Proposition \ref{['prop:2n-convergence']})
Base Case
...and 23 more sections

Key Result

Theorem 4.1

Suppose that Assumption assumption:uniform-ergodic--assumption:noise$(p=1)$ hold. The projected SA eq:sa-iterate is applied with radius parameter $2\|\theta^*\|\leq\beta<\infty.$ For stepsize $\alpha>0$ that satisfies the constraint $\alpha\tau_\alpha \leq \frac{\mu}{(940+96\beta)L^2}$, the Markov c

Theorems & Definitions (38)

Definition 2.1
Theorem 4.1: Ergodicity of Projected SA
Proposition 4.2
Theorem 4.3: Ergodicity of SA -- Minorization
Corollary 4.4: Non-Asymptotic Convergence Rate
Corollary 4.5: Central Limit Theorem
Theorem 4.6: Bias Characterization
Corollary 4.7: Tail Averaging
Corollary 4.8: RR-Extrapolation
Lemma B.1
...and 28 more

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

TL;DR

Abstract

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)