Table of Contents
Fetching ...

A stochastic gradient descent algorithm with random search directions

Eméric Gbaguidi

TL;DR

The paper tackles unconstrained finite-sum optimization and the cost of full gradient evaluations in high dimensions by introducing SCORS, a stochastic gradient method with random search directions, updating $X_{n+1}=X_n-\gamma_n D(V_{n+1})\nabla f_{U_{n+1}}(X_n)$ where $D(v)=vv^T$ and $E[D(V_{n+1})|\mathcal{F}_n]=\mathbf{I}_d$. Under mild smoothness and growth conditions, it proves almost-sure convergence with decreasing steps, establishes a central limit theorem with an asymptotic covariance $\Sigma=\int_0^\infty (e^{-(H-I_d/2)u})^T\Gamma e^{-(H-I_d/2)u}du$ and $\Gamma=\mathbb{E}[V V^T Q V V^T]$, and derives non-asymptotic $L^p$ rates $\mathbb{E}\|X_n-x^*\|^{2p}\le K_p/n^{p\alpha}$ for $\gamma_n=c/n^\alpha$, $\tfrac12<\alpha\le1$. Theoretical results depend on the search-direction distribution (uniform, non-uniform, Gaussian, spherical) via explicit $\Gamma$-forms, and numerical experiments on logistic regression corroborate the CLT and reveal practical performance trade-offs, with uniform directions often achieving the smallest asymptotic variance and superior per-iteration efficiency.

Abstract

Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted to canonical basis vectors of $\mathbb{R}^d$. In this paper, we develop a new class of stochastic gradient descent algorithms with random search directions which uses the directional derivative of the gradient estimate following more general random vectors. We establish the almost sure convergence of these algorithms with decreasing step. We further investigate their central limit theorem and pay particular attention to analyze the impact of the search distributions on the asymptotic covariance matrix. We also provide non-asymptotic $\mathbb{L}^p$ rates of convergence.

A stochastic gradient descent algorithm with random search directions

TL;DR

The paper tackles unconstrained finite-sum optimization and the cost of full gradient evaluations in high dimensions by introducing SCORS, a stochastic gradient method with random search directions, updating where and . Under mild smoothness and growth conditions, it proves almost-sure convergence with decreasing steps, establishes a central limit theorem with an asymptotic covariance and , and derives non-asymptotic rates for , . Theoretical results depend on the search-direction distribution (uniform, non-uniform, Gaussian, spherical) via explicit -forms, and numerical experiments on logistic regression corroborate the CLT and reveal practical performance trade-offs, with uniform directions often achieving the smallest asymptotic variance and superior per-iteration efficiency.

Abstract

Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted to canonical basis vectors of . In this paper, we develop a new class of stochastic gradient descent algorithms with random search directions which uses the directional derivative of the gradient estimate following more general random vectors. We establish the almost sure convergence of these algorithms with decreasing step. We further investigate their central limit theorem and pay particular attention to analyze the impact of the search distributions on the asymptotic covariance matrix. We also provide non-asymptotic rates of convergence.

Paper Structure

This paper contains 11 sections, 7 theorems, 95 equations, 3 figures, 1 table.

Key Result

Theorem 1

Consider that $(X_n)$ is the sequence generated by the SCORS algorithm with decreasing step sequence $(\gamma_n)$ satisfying (gamma_cond1). In addition, suppose that Assumptions scgd_cond1b_s_policy, scgd_cond1, scgd_cond2 and scgd_cond3 are satisfied. Then, we have and

Figures (3)

  • Figure 1: Almost sure convergence of the algorithms with $\gamma_n=1/n$.
  • Figure 2: We used 1000 samples, where each one was obtained by running the associated algorithm for $n=500000$ iterations.
  • Figure 3: Mean squared error with respect to epochs. We confirm the decreasing order of the mean squared error of $X_n-x^*$ with respect to $n$.

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Proposition 3
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • proof
  • ...and 2 more