Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

Donghwa Kim; Jaewook Lee; Chulhee Yun

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

Donghwa Kim, Jaewook Lee, Chulhee Yun

TL;DR

The paper investigates convergence of two stochastic coordinate descent variants, RCD and RPCD, for smooth convex quadratic objectives. It proves that, for a class of quadratics with permutation-invariant Hessians, the RPCD contraction upper bound is strictly better than the RCD contraction lower bound, yielding a provable performance gap. By strengthening RCD lower bounds and matching RPCD upper bounds within this class, the authors show RPCD outperforms RCD on every instance in the class and conjecture this extends to all positive-definite quadratics. The work combines spectral-operator analysis, a dimension-reduction approach for permutation-invariant Hessians, and algorithmic searches with experiments to illustrate practical gains, offering a rigorous justification for the empirical success of random permutations in coordinate descent.

Abstract

We analyze the convergence rates of two popular variants of coordinate descent (CD): random CD (RCD), in which the coordinates are sampled uniformly at random, and random-permutation CD (RPCD), in which random permutations are used to select the update indices. Despite abundant empirical evidence that RPCD outperforms RCD in various tasks, the theoretical gap between the two algorithms' performance has remained elusive. Even for the benign case of positive-definite quadratic functions with permutation-invariant Hessians, previous efforts have failed to demonstrate a provable performance gap between RCD and RPCD. To this end, we present novel results showing that, for a class of quadratics with permutation-invariant structures, the contraction rate upper bound for RPCD is always strictly smaller than the contraction rate lower bound for RCD for every individual problem instance. Furthermore, we conjecture that this function class contains the worst-case examples of RPCD among all positive-definite quadratics. Combined with our RCD lower bound, this conjecture extends our results to the general class of positive-definite quadratic functions.

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

TL;DR

Abstract

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (53)