Table of Contents
Fetching ...

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

Lam M. Nguyen, Dzung T. Phan, Jayant Kalagnanam

Abstract

Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond random reshuffling. In this paper, we design a pipeline using a large language model (LLM)-guided program evolution framework to discover an effective shuffling rule for without-replacement SGD. Abstracting from this instance, we identify two fundamental structural components: block reshuffling and paired reversal. We analyze these components separately and show that block reshuffling strictly reduces prefix-gradient variance constants within the unified shuffling framework, yielding provable improvements over random reshuffling under mild conditions. Separately, we show that paired reversal symmetrizes the epoch map and cancels the leading order-dependent second-order term, reducing order sensitivity from quadratic to cubic in the step size. Numerical experiments with the discovered algorithm validate the theory and demonstrate consistent gains over standard shuffling schemes across convex and nonconvex benchmarks.

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

Abstract

Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond random reshuffling. In this paper, we design a pipeline using a large language model (LLM)-guided program evolution framework to discover an effective shuffling rule for without-replacement SGD. Abstracting from this instance, we identify two fundamental structural components: block reshuffling and paired reversal. We analyze these components separately and show that block reshuffling strictly reduces prefix-gradient variance constants within the unified shuffling framework, yielding provable improvements over random reshuffling under mild conditions. Separately, we show that paired reversal symmetrizes the epoch map and cancels the leading order-dependent second-order term, reducing order sensitivity from quadratic to cubic in the step size. Numerical experiments with the discovered algorithm validate the theory and demonstrate consistent gains over standard shuffling schemes across convex and nonconvex benchmarks.

Paper Structure

This paper contains 55 sections, 10 theorems, 88 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3.1

With equal block sizes $b=n/K$, Consequently, $\sigma_{\mathrm{blk}}^2(w)\le \sigma_{\mathrm{ind}}^2(w)$, with strict inequality whenever $\sigma_{\mathrm{within}}^2(w)>0$. $\blacktriangleleft$$\blacktriangleleft$

Figures (7)

  • Figure 1: Performance comparison under different learning-rate schedules for a9a, boston, and FashionMNIST datasets.
  • Figure 2: Classification Datasets
  • Figure 3: Regression Datasets
  • Figure 4: Classification Datasets (NN)
  • Figure 5: Classification Datasets
  • ...and 2 more figures

Theorems & Definitions (19)

  • Lemma 3.1
  • Lemma 3.2: mishchenko2020random
  • Proposition 3.3: RR prefix variance (sample-level)
  • Proposition 3.4: Block reshuffling prefix variance (block-level)
  • Lemma 3.5
  • Theorem 3.6
  • Theorem 3.7
  • Proposition 3.8
  • Theorem 3.9
  • Proposition 3.10
  • ...and 9 more