Table of Contents
Fetching ...

Shuffling Momentum Gradient Algorithm for Convex Optimization

Trang H. Tran, Quoc Tran-Dinh, Lam M. Nguyen

TL;DR

This work analyzes the Shuffling Momentum Gradient (SMG) algorithm for finite-sum convex optimization, combining shuffling updates with an anchor momentum to improve convergence. It provides new theoretical guarantees in both merely convex and strongly convex settings, achieving the state-of-the-art rate $O\left(\frac{1}{nT^2}\right)$ in the strongly convex case under randomized reshuffling, and matching leading rates for convex problems ($\mathcal{O}(n^{-1/3}T^{-2/3})$ with common LR schedules). The analysis introduces an anchor-momentum scheme where the epoch-end gradient average updates the momentum term, and derives key recursive bounds leveraging Bregman divergences and variance terms. Empirical results on logistic regression tasks corroborate the theoretical findings, showing competitive or superior training performance relative to SGD, SGD with momentum, and Adam under randomized reshuffling. Overall, the paper advances the understanding of momentum-augmented shuffling methods and their optimality in convex regimes, with clear avenues for future work on broader problem classes and momentum schemes.

Abstract

The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of $O(1/nT^2)$, where $n$ is the number of samples and $T$ is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.

Shuffling Momentum Gradient Algorithm for Convex Optimization

TL;DR

This work analyzes the Shuffling Momentum Gradient (SMG) algorithm for finite-sum convex optimization, combining shuffling updates with an anchor momentum to improve convergence. It provides new theoretical guarantees in both merely convex and strongly convex settings, achieving the state-of-the-art rate in the strongly convex case under randomized reshuffling, and matching leading rates for convex problems ( with common LR schedules). The analysis introduces an anchor-momentum scheme where the epoch-end gradient average updates the momentum term, and derives key recursive bounds leveraging Bregman divergences and variance terms. Empirical results on logistic regression tasks corroborate the theoretical findings, showing competitive or superior training performance relative to SGD, SGD with momentum, and Adam under randomized reshuffling. Overall, the paper advances the understanding of momentum-augmented shuffling methods and their optimality in convex regimes, with clear avenues for future work on broader problem classes and momentum schemes.

Abstract

The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of , where is the number of samples and is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.
Paper Structure (20 sections, 11 theorems, 82 equations, 1 figure, 1 algorithm)

This paper contains 20 sections, 11 theorems, 82 equations, 1 figure, 1 algorithm.

Key Result

Lemma 1

Suppose that Assumption as:A1 and Assumption ass_convex holds for ERM_problem_01. Let $\{w_i^{(t)}\}_{t=1}^{T}$ be generated by Algorithm sgd_momentum_shuffling_01 with a fixed momentum parameter $0\leq \beta < 1$ and an epoch learning rate $\eta_i^{(t)} := \frac{\eta_t}{n}$ for $t \geq 1$. Assume t If $t=1$, then we have:

Figures (1)

  • Figure 1: The train loss (left) and test accuracy (right) produced by SMG, SGD, SGD-M, and Adam for the w8a and ijcnn1 datasets, respectively.

Theorems & Definitions (24)

  • Lemma 1
  • Theorem 1
  • proof
  • Corollary 1: Constant learning rate
  • proof
  • Corollary 2: Exponential scheduled learning rate
  • proof
  • Remark 1: Convergence Rates
  • Remark 2: Learning Rate Schedules
  • Theorem 2
  • ...and 14 more