Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size

Quang Dinh Thien Nguyen; Duc Anh Nguyen; Hoang Huy Nguyen; Siva Theja Maguluri

Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size

Quang Dinh Thien Nguyen, Duc Anh Nguyen, Hoang Huy Nguyen, Siva Theja Maguluri

TL;DR

This work establishes almost sure convergence of nonlinear stochastic approximation under a general negative drift and a noise process with finite $p$-th moment ($p>1$). It proves that any non-summable but $p$-th power summable step size sequence suffices, with explicit $\alpha_k=\alpha(k+K)^{-\xi}$ requirements $\xi\in(1/p,1]$, and extends to multiplicative noise. A key innovation is a universal Lyapunov drift argument and, for $p>2$, a novel iterate projection technique that handles nonlinear and high-moment terms. The results generalize classical $p=2$ theory, connect to SLLN, and apply to contractive, linear, SGD, and non-expansive settings, offering a refined noise-step-size trade-off with potential finite-time implications.

Abstract

We study the almost sure convergence of the Stochastic Approximation algorithm to the fixed point $x^\star$ of a nonlinear operator under a negative drift condition and a general noise sequence with finite $p$-th moment for some $p > 1$. Classical almost sure convergence results of Stochastic Approximation are mostly analyzed for the square-integrable noise setting, and it is shown that any non-summable but square-summable step size sequence is sufficient to obtain almost sure convergence. However, such a limitation prevents wider algorithmic application. In particular, many applications in Machine Learning and Operations Research admit heavy-tailed noise with infinite variance, rendering such guarantees inapplicable. On the other hand, when a stronger condition on the noise is available, such guarantees on the step size would be too conservative, as practitioners would like to pick a larger step size for a more preferable convergence behavior. To this end, we show that any non-summable but $p$-th power summable step size sequence is sufficient to guarantee almost sure convergence, covering the gap in the literature. Our guarantees are obtained using a universal Lyapunov drift argument. For the regime $p \in (1, 2)$, we show that using the Lyapunov function $\norm{x-x^\star}^p$ and applying a Taylor-like bound suffice. For $p > 2$, such an approach is no longer applicable, and therefore, we introduce a novel iterate projection technique to control the nonlinear terms produced by high-moment bounds and multiplicative noise. We believe our proof techniques and their implications could be of independent interest and pave the way for finite-time analysis of Stochastic Approximation under a general noise condition.

Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size

TL;DR

This work establishes almost sure convergence of nonlinear stochastic approximation under a general negative drift and a noise process with finite

-th moment (

). It proves that any non-summable but

-th power summable step size sequence suffices, with explicit

requirements

, and extends to multiplicative noise. A key innovation is a universal Lyapunov drift argument and, for

, a novel iterate projection technique that handles nonlinear and high-moment terms. The results generalize classical

theory, connect to SLLN, and apply to contractive, linear, SGD, and non-expansive settings, offering a refined noise-step-size trade-off with potential finite-time implications.

Abstract

We study the almost sure convergence of the Stochastic Approximation algorithm to the fixed point

of a nonlinear operator under a negative drift condition and a general noise sequence with finite

-th moment for some

. Classical almost sure convergence results of Stochastic Approximation are mostly analyzed for the square-integrable noise setting, and it is shown that any non-summable but square-summable step size sequence is sufficient to obtain almost sure convergence. However, such a limitation prevents wider algorithmic application. In particular, many applications in Machine Learning and Operations Research admit heavy-tailed noise with infinite variance, rendering such guarantees inapplicable. On the other hand, when a stronger condition on the noise is available, such guarantees on the step size would be too conservative, as practitioners would like to pick a larger step size for a more preferable convergence behavior. To this end, we show that any non-summable but

-th power summable step size sequence is sufficient to guarantee almost sure convergence, covering the gap in the literature. Our guarantees are obtained using a universal Lyapunov drift argument. For the regime

, we show that using the Lyapunov function

and applying a Taylor-like bound suffice. For

, such an approach is no longer applicable, and therefore, we introduce a novel iterate projection technique to control the nonlinear terms produced by high-moment bounds and multiplicative noise. We believe our proof techniques and their implications could be of independent interest and pave the way for finite-time analysis of Stochastic Approximation under a general noise condition.

Paper Structure (27 sections, 15 theorems, 94 equations, 2 figures)

This paper contains 27 sections, 15 theorems, 94 equations, 2 figures.

Introduction
Contributions
Literature overview
Problem setting and Main Results
Problem setting
Main results
Example: Contractive operators
Example: Linear operators
Example: Stochastic Gradient Descent (SGD)
Example: Strong Law of Large Numbers (SLLN) with step-sizes
Generalization to non-expansive operators
Proof outline
$p = 2$
$p \in (1,2)$
$p > 2$
...and 12 more sections

Key Result

Theorem 2.1

Suppose Assumption assumption: general-drift-condition, assumption: noise-unbiasedness, assumption: noise-condition-p for some $p > 1$ and assumption: Lipschitz hold, then for any step size sequence $\{\alpha_k\}_{k \geq 0}$ of the update equation: main_iterate to be a non-increasing sequence that s then the iteration $x_k$ converges almost surely to the unique solution of problem: fixed_point_equ

Figures (2)

Figure 1: Visual summary of the Lyapunov drift argument with the iterate projection trick.
Figure 2: Selector control with theoretical noise and multiple step sizes, here $p^{-1} = 0.625$ is the divergence threshold for $p = 1.6$. Our experimental results confirm the tightness of the condition $\xi > p^{-1}$ for $p = 1.6$.

Theorems & Definitions (31)

Theorem 2.1
Corollary 2.1
proof
Corollary 2.2
proof
Theorem 2.2
Corollary 2.3
proof
Corollary 2.4
proof
...and 21 more

Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size

TL;DR

Abstract

Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)