One-dimensional System Arising in Stochastic Gradient Descent
Konstantinos Karatapanis
TL;DR
The paper studies a one-dimensional stochastic approximation framework with time-inhomogeneous drift and diffusion, revealing a parameter-dependent threshold $\tilde{\gamma}$ (tied to the local nonlinearity near the origin) that determines whether the iterates escape the origin or converge to it. Through continuous-time SDEs and a discrete-time analogue, the authors derive regime-specific results for monomial and linear drifts, using mean-flow versus remaining-variance analysis, time-changes, and stopping-time arguments to characterize when $\mathbb{P}(X_t \to 0)$ is positive versus zero. A key contribution is showing that noisy SGD-like updates, $X_{n+1}-X_n = F'(X_n)/n^{\gamma} + Y_n/n^{\gamma}$, can escape degenerate saddle points under suitable $\gamma$, with precise thresholds depending on the local geometry of $F$ (or $V$). The findings provide rigorous insight into noise-enabled saddle-point escape in one dimension and establish guidance on parameter choices to promote convergence to minima in stochastic optimization contexts. Practically, the work quantifies how time-decaying noise interacts with polynomial drift to determine long-run behavior, informing design choices in stochastic gradient-like algorithms with degenerate saddles.
Abstract
We consider SDEs of the form $dX_t = |f(X_t)|/t^γ dt+1/t^γ dB_t$, where $f(x)$ behaves comparably to $|x|^k$ in a neighborhood of the origin, for $k\in [1,\infty)$. We show that there exists a threshold value $:=\tildeγ$ for $γ$, depending on $k$, such that when $γ\in (1/2, \tildeγ)$ then $\mathbb{P}(X_n\rightarrow 0) = 0$, and for the rest of the permissible values $\mathbb{P}(X_n\rightarrow 0)>0$. The previous results extend for discrete processes that satisfy $X_{n+1}-X_n = f(X_n)/n^γ+Y_n/n^γ$. Here, $Y_{n+1}$ are martingale differences that are a.s. bounded. This result shows that for a function $F$, whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration $X_{n+1}-X_n =F'(X_n)/n^γ+Y_n/n^γ$ for a suitable choice of $γ$.
