A stochastic gradient descent algorithm with random search directions
Eméric Gbaguidi
TL;DR
The paper tackles unconstrained finite-sum optimization and the cost of full gradient evaluations in high dimensions by introducing SCORS, a stochastic gradient method with random search directions, updating $X_{n+1}=X_n-\gamma_n D(V_{n+1})\nabla f_{U_{n+1}}(X_n)$ where $D(v)=vv^T$ and $E[D(V_{n+1})|\mathcal{F}_n]=\mathbf{I}_d$. Under mild smoothness and growth conditions, it proves almost-sure convergence with decreasing steps, establishes a central limit theorem with an asymptotic covariance $\Sigma=\int_0^\infty (e^{-(H-I_d/2)u})^T\Gamma e^{-(H-I_d/2)u}du$ and $\Gamma=\mathbb{E}[V V^T Q V V^T]$, and derives non-asymptotic $L^p$ rates $\mathbb{E}\|X_n-x^*\|^{2p}\le K_p/n^{p\alpha}$ for $\gamma_n=c/n^\alpha$, $\tfrac12<\alpha\le1$. Theoretical results depend on the search-direction distribution (uniform, non-uniform, Gaussian, spherical) via explicit $\Gamma$-forms, and numerical experiments on logistic regression corroborate the CLT and reveal practical performance trade-offs, with uniform directions often achieving the smallest asymptotic variance and superior per-iteration efficiency.
Abstract
Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted to canonical basis vectors of $\mathbb{R}^d$. In this paper, we develop a new class of stochastic gradient descent algorithms with random search directions which uses the directional derivative of the gradient estimate following more general random vectors. We establish the almost sure convergence of these algorithms with decreasing step. We further investigate their central limit theorem and pay particular attention to analyze the impact of the search distributions on the asymptotic covariance matrix. We also provide non-asymptotic $\mathbb{L}^p$ rates of convergence.
