On the SAGA algorithm with decreasing step
Luis Fredes, Bernard Bercu, Eméric Gbaguidi
TL;DR
This paper analyzes a generalized $\lambda$-SAGA algorithm for minimizing $f(x)=\dfrac{1}{N}\sum_{k=1}^N f_k(x)$ with a decreasing step size $\{\gamma_n\}$, unifying SGD and SAGA via $\lambda\in[0,1]$. It shows almost sure convergence to the optimum and a central limit theorem under weakened assumptions that avoid strong convexity and Lipschitz gradient, and it proves non-asymptotic $\mathbb{L}^p$ convergence rates that depend on the rate parameter $\alpha$ of $\gamma_n$. Numerical experiments on MNIST-based logistic regression illustrate the theory, including variance reduction as $\lambda$ increases. Overall, the work provides a unified, practically applicable convergence framework for stochastic variance-reduction methods with decreasing steps.
Abstract
Stochastic optimization naturally appear in many application areas, including machine learning. Our goal is to go further in the analysis of the Stochastic Average Gradient Accelerated (SAGA) algorithm. To achieve this, we introduce a new $λ$-SAGA algorithm which interpolates between the Stochastic Gradient Descent ($λ=0$) and the SAGA algorithm ($λ=1$). Firstly, we investigate the almost sure convergence of this new algorithm with decreasing step which allows us to avoid the restrictive strong convexity and Lipschitz gradient hypotheses associated to the objective function. Secondly, we establish a central limit theorem for the $λ$-SAGA algorithm. Finally, we provide the non-asymptotic $\mathbb{L}^p$ rates of convergence.
