Table of Contents
Fetching ...

Nonconvex optimization and convergence of stochastic gradient descent, and solution of asynchronous game

Kevin Buck, Jessica Babyak, Paolo Piersanti, Kevin Zumbrun, Christiane Gallos, Dorothea Gallos

TL;DR

The work investigates convergence of stochastic gradient methods for both convex and nonconvex objectives, identifying step-size regimes, time-averaging, and stochastic-coordinate variants that guarantee convergence to the critical set $\mathcal{C}$ under mild conditions. It develops a unified stochastic-approximation framework with key energy-type estimates, analyzes deterministic and stochastic cases, and extends to approximate convexity where $f$ behaves like a strongly convex function near the optimum. The authors connect SGD to continuous-time dynamics and Fokker–Planck equations, providing intuition and numerical schemes (e.g., Crank–Nicolson) to study diffusion-like behavior and equilibrium distributions. A major new contribution is applying these methods to two- and asynchronous multi-player games by smoothing the max via $\ell^p$ norms, yielding convergent SGD updates in nonconvex game settings and offering practical insights for smoothing, step-size design, and potential multigrid strategies for large-scale problems.

Abstract

We review convergence and behavior of stochastic gradient descent for convex and nonconvex optimization, establishing various conditions for convergence to zero of the variance of the gradient of the objective function, and presenting a number of simple examples demonstrating the approximate evolution of the probability density under iteration, including applications to both classical two-player and asynchronous multiplayer games

Nonconvex optimization and convergence of stochastic gradient descent, and solution of asynchronous game

TL;DR

The work investigates convergence of stochastic gradient methods for both convex and nonconvex objectives, identifying step-size regimes, time-averaging, and stochastic-coordinate variants that guarantee convergence to the critical set under mild conditions. It develops a unified stochastic-approximation framework with key energy-type estimates, analyzes deterministic and stochastic cases, and extends to approximate convexity where behaves like a strongly convex function near the optimum. The authors connect SGD to continuous-time dynamics and Fokker–Planck equations, providing intuition and numerical schemes (e.g., Crank–Nicolson) to study diffusion-like behavior and equilibrium distributions. A major new contribution is applying these methods to two- and asynchronous multi-player games by smoothing the max via norms, yielding convergent SGD updates in nonconvex game settings and offering practical insights for smoothing, step-size design, and potential multigrid strategies for large-scale problems.

Abstract

We review convergence and behavior of stochastic gradient descent for convex and nonconvex optimization, establishing various conditions for convergence to zero of the variance of the gradient of the objective function, and presenting a number of simple examples demonstrating the approximate evolution of the probability density under iteration, including applications to both classical two-player and asynchronous multiplayer games

Paper Structure

This paper contains 29 sections, 17 theorems, 131 equations, 15 figures.

Key Result

Proposition 1.1

Assuming the Hessian bound $|\nabla^2 f|\leq L$, and taking $\alpha_j=\alpha \equiv \text{\rm constant}$ with $\alpha <2/L$, we have for any solution $\{w_m\}$ of GD that (i) $f(w_m)$ is monotone decreasing, and (ii) $\nabla f(w_m)\to 0$ as $m\to\infty$. If, also, $|f(w)|\to \infty$ as $|w|\to \inft

Figures (15)

  • Figure 1: A histogram of the results of a Monte-Carlo simulation of SGD for the convex function \ref{['xeq']} using $50,000$ trials plotted simultaneously with the density function predicted by \ref{['recursions']}.
  • Figure 2: Several histograms comparing Monte-Carlo simulations of SGD for the nonconvex function \ref{['ncf']}, varying key parameters. Each uses $50,000$ trials of $500$ SGD iterations plotted simultaneously with a normal distribution of equal mean and variance. The value of $\sigma$ determines the function $g(x)$, which varies across the figures as labeled. For all figures $\alpha(m)=\frac{c}{\log(1+m)}$ is used for the stepsize, with variable values of $c$.
  • Figure 3: A plot of the nonconvex function $f(x)$ defined by \ref{['ncf']}.
  • Figure 4: Three time slices of a simulation of the Fokker-Planck equations \ref{['newx']} corresponding to the previous examples of convex and nonconvex functions. We see the qualitative agreement of the method with previous Monte-Carlo and analytic results. For both the initial condition is given by a small Gaussian centered at 0 and an approximation of a point mass far from the center.
  • Figure 5: A histogram plotting 1,000 trials of 1,000 iterations of the described SGD algorithm, along with the density function of a normal distribution of the same expectation and variance. Here we use $p=10$, $\alpha(m)=.1$.
  • ...and 10 more figures

Theorems & Definitions (45)

  • Proposition 1.1
  • proof
  • Proposition 1.2
  • Corollary 1.3
  • Proposition 1.4
  • Remark 1.5
  • Theorem 1.6
  • Proposition 1.7
  • Proposition 1.8: Co,SGD
  • Proposition 2.1
  • ...and 35 more