Almost Sure Convergence of Nonlinear Stochastic Approximation: An Interplay of Noise and Step Size
Quang Dinh Thien Nguyen, Duc Anh Nguyen, Hoang Huy Nguyen, Siva Theja Maguluri
TL;DR
This work establishes almost sure convergence of nonlinear stochastic approximation under a general negative drift and a noise process with finite $p$-th moment ($p>1$). It proves that any non-summable but $p$-th power summable step size sequence suffices, with explicit $\alpha_k=\alpha(k+K)^{-\xi}$ requirements $\xi\in(1/p,1]$, and extends to multiplicative noise. A key innovation is a universal Lyapunov drift argument and, for $p>2$, a novel iterate projection technique that handles nonlinear and high-moment terms. The results generalize classical $p=2$ theory, connect to SLLN, and apply to contractive, linear, SGD, and non-expansive settings, offering a refined noise-step-size trade-off with potential finite-time implications.
Abstract
We study the almost sure convergence of the Stochastic Approximation algorithm to the fixed point $x^\star$ of a nonlinear operator under a negative drift condition and a general noise sequence with finite $p$-th moment for some $p > 1$. Classical almost sure convergence results of Stochastic Approximation are mostly analyzed for the square-integrable noise setting, and it is shown that any non-summable but square-summable step size sequence is sufficient to obtain almost sure convergence. However, such a limitation prevents wider algorithmic application. In particular, many applications in Machine Learning and Operations Research admit heavy-tailed noise with infinite variance, rendering such guarantees inapplicable. On the other hand, when a stronger condition on the noise is available, such guarantees on the step size would be too conservative, as practitioners would like to pick a larger step size for a more preferable convergence behavior. To this end, we show that any non-summable but $p$-th power summable step size sequence is sufficient to guarantee almost sure convergence, covering the gap in the literature. Our guarantees are obtained using a universal Lyapunov drift argument. For the regime $p \in (1, 2)$, we show that using the Lyapunov function $\norm{x-x^\star}^p$ and applying a Taylor-like bound suffice. For $p > 2$, such an approach is no longer applicable, and therefore, we introduce a novel iterate projection technique to control the nonlinear terms produced by high-moment bounds and multiplicative noise. We believe our proof techniques and their implications could be of independent interest and pave the way for finite-time analysis of Stochastic Approximation under a general noise condition.
