Table of Contents
Fetching ...

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning

Junwen Qiu, Leilei Mei, Junyu Zhang

Abstract

The global Lipschitz smoothness condition underlies most convergence and complexity analyses via two key consequences: the descent lemma and the gradient Lipschitz continuity. How to study the performance of optimization algorithms in the absence of Lipschitz smoothness remains an active area. The relative smoothness framework from Bauschke-Bolte-Teboulle (2017) and Lu-Freund-Nesterov (2018) provides an extended descent lemma, ensuring convergence of Bregman-based proximal gradient methods and their vanilla stochastic counterparts. However, many widely used techniques (e.g., momentum schemes, random reshuffling, and variance reduction) additionally require the Lipschitz-type bound for gradient deviations, leaving their analysis under relative smoothness an open area. To resolve this issue, we introduce the dual kernel conditioning (DKC) regularity condition to regulate the local relative curvature of the kernel functions. Combined with the relative smoothness, DKC provides a dual Lipschitz continuity for gradients: even though the gradient mapping is not Lipschitz in the primal space, it preserves Lipschitz continuity in the dual space induced by a mirror map. We verify that DKC is widely satisfied by popular kernels and is closed under affine composition and conic combination. With these novel tools, we establish the first complexity bounds as well as the iterate convergence of random reshuffling mirror descent for constrained nonconvex relative smooth problems.

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning

Abstract

The global Lipschitz smoothness condition underlies most convergence and complexity analyses via two key consequences: the descent lemma and the gradient Lipschitz continuity. How to study the performance of optimization algorithms in the absence of Lipschitz smoothness remains an active area. The relative smoothness framework from Bauschke-Bolte-Teboulle (2017) and Lu-Freund-Nesterov (2018) provides an extended descent lemma, ensuring convergence of Bregman-based proximal gradient methods and their vanilla stochastic counterparts. However, many widely used techniques (e.g., momentum schemes, random reshuffling, and variance reduction) additionally require the Lipschitz-type bound for gradient deviations, leaving their analysis under relative smoothness an open area. To resolve this issue, we introduce the dual kernel conditioning (DKC) regularity condition to regulate the local relative curvature of the kernel functions. Combined with the relative smoothness, DKC provides a dual Lipschitz continuity for gradients: even though the gradient mapping is not Lipschitz in the primal space, it preserves Lipschitz continuity in the dual space induced by a mirror map. We verify that DKC is widely satisfied by popular kernels and is closed under affine composition and conic combination. With these novel tools, we establish the first complexity bounds as well as the iterate convergence of random reshuffling mirror descent for constrained nonconvex relative smooth problems.
Paper Structure (34 sections, 18 theorems, 137 equations, 3 figures, 1 algorithm)

This paper contains 34 sections, 18 theorems, 137 equations, 3 figures, 1 algorithm.

Key Result

Proposition 2.4

[proposition]prop kernel The following kernels all satisfy the DKC regularity condition:

Figures (3)

  • Figure 1: The first illustrates the dual space interpretation of mirror descent. The second illustrates the construction of the integration path $z(t)$ in \ref{['lem:Lipschitz continuity', 'lem:important bounds']}, by mapping the line segment between $\nabla h(x)$ and $\nabla h(y)$ back to the primal space. The third illustrates $\nabla h^*(\mathcal{B})$ under quartic power kernel, where $\mathcal{B}$ is a ball in the dual space. This shows that when $\nabla h(\mathcal{X})$ is convex in dual space, $\mathcal{X}$ may not necessarily be convex in primal space.
  • Figure 2: Numerical results on phase retrieval problem. Performance over five independent run.
  • Figure 3: Numerical results on Poisson inverse problem. Performance over five independent run.

Theorems & Definitions (35)

  • Definition 2.1: Relative smoothness
  • Definition 2.2
  • Definition 2.3: DKC regularity
  • Proposition 2.4
  • Proposition 2.5: Closedness of DKC regularity
  • Lemma 2.6
  • proof
  • Lemma 2.7
  • Lemma 2.8
  • proof
  • ...and 25 more