Table of Contents
Fetching ...

One-dimensional System Arising in Stochastic Gradient Descent

Konstantinos Karatapanis

TL;DR

The paper studies a one-dimensional stochastic approximation framework with time-inhomogeneous drift and diffusion, revealing a parameter-dependent threshold $\tilde{\gamma}$ (tied to the local nonlinearity near the origin) that determines whether the iterates escape the origin or converge to it. Through continuous-time SDEs and a discrete-time analogue, the authors derive regime-specific results for monomial and linear drifts, using mean-flow versus remaining-variance analysis, time-changes, and stopping-time arguments to characterize when $\mathbb{P}(X_t \to 0)$ is positive versus zero. A key contribution is showing that noisy SGD-like updates, $X_{n+1}-X_n = F'(X_n)/n^{\gamma} + Y_n/n^{\gamma}$, can escape degenerate saddle points under suitable $\gamma$, with precise thresholds depending on the local geometry of $F$ (or $V$). The findings provide rigorous insight into noise-enabled saddle-point escape in one dimension and establish guidance on parameter choices to promote convergence to minima in stochastic optimization contexts. Practically, the work quantifies how time-decaying noise interacts with polynomial drift to determine long-run behavior, informing design choices in stochastic gradient-like algorithms with degenerate saddles.

Abstract

We consider SDEs of the form $dX_t = |f(X_t)|/t^γ dt+1/t^γ dB_t$, where $f(x)$ behaves comparably to $|x|^k$ in a neighborhood of the origin, for $k\in [1,\infty)$. We show that there exists a threshold value $:=\tildeγ$ for $γ$, depending on $k$, such that when $γ\in (1/2, \tildeγ)$ then $\mathbb{P}(X_n\rightarrow 0) = 0$, and for the rest of the permissible values $\mathbb{P}(X_n\rightarrow 0)>0$. The previous results extend for discrete processes that satisfy $X_{n+1}-X_n = f(X_n)/n^γ+Y_n/n^γ$. Here, $Y_{n+1}$ are martingale differences that are a.s. bounded. This result shows that for a function $F$, whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration $X_{n+1}-X_n =F'(X_n)/n^γ+Y_n/n^γ$ for a suitable choice of $γ$.

One-dimensional System Arising in Stochastic Gradient Descent

TL;DR

The paper studies a one-dimensional stochastic approximation framework with time-inhomogeneous drift and diffusion, revealing a parameter-dependent threshold (tied to the local nonlinearity near the origin) that determines whether the iterates escape the origin or converge to it. Through continuous-time SDEs and a discrete-time analogue, the authors derive regime-specific results for monomial and linear drifts, using mean-flow versus remaining-variance analysis, time-changes, and stopping-time arguments to characterize when is positive versus zero. A key contribution is showing that noisy SGD-like updates, , can escape degenerate saddle points under suitable , with precise thresholds depending on the local geometry of (or ). The findings provide rigorous insight into noise-enabled saddle-point escape in one dimension and establish guidance on parameter choices to promote convergence to minima in stochastic optimization contexts. Practically, the work quantifies how time-decaying noise interacts with polynomial drift to determine long-run behavior, informing design choices in stochastic gradient-like algorithms with degenerate saddles.

Abstract

We consider SDEs of the form , where behaves comparably to in a neighborhood of the origin, for . We show that there exists a threshold value for , depending on , such that when then , and for the rest of the permissible values . The previous results extend for discrete processes that satisfy . Here, are martingale differences that are a.s. bounded. This result shows that for a function , whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration for a suitable choice of .

Paper Structure

This paper contains 16 sections, 37 theorems, 71 equations.

Key Result

Theorem 1.1

Suppose that $\mathcal{N}$ is a neighborhood of zero. Let $(L_t)_{t\geq 1}$ be a solution of eq:GenIntro, where $f(x)$ is Lipschitz. We distinguish two cases depending on $f$ and the parameters of the system If either enum:nonconv1 or enum:nonconv2 hold, then $\mathbb{P}(L_t\rightarrow 0)=0$.

Theorems & Definitions (37)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Lemma 2.1
  • Lemma 2.2
  • Proposition 2.3
  • Lemma 2.4
  • Lemma 2.5
  • Proposition 3.1
  • ...and 27 more