Table of Contents
Fetching ...

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Junwen Qiu, Ziyang Zeng, Leilei Mei, Junyu Zhang

Abstract

Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothness and (ii) bi-convexity of the associated Bregman divergence function. Unfortunately, these conditions are violated by nearly all kernels used in practice, leaving a huge theory-practice gap. This work closes this gap by developing a unified analytical tool that guarantees convergence under mild conditions. Specifically, we introduce Hessian relative uniform continuity (HRUC), a regularity satisfied by nearly all standard kernels. Importantly, HRUC is closed under concatenation, positive scaling, composition, and various kernel combinations. Leveraging the geometric structure induced by HRUC, we derive convergence guarantees for mirror descent-based gradient tracking without imposing any restrictive assumptions. More broadly, our analysis techniques extend seamlessly to other decentralized optimization methods in genuinely non-Euclidean and non-Lipschitz settings.

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Abstract

Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothness and (ii) bi-convexity of the associated Bregman divergence function. Unfortunately, these conditions are violated by nearly all kernels used in practice, leaving a huge theory-practice gap. This work closes this gap by developing a unified analytical tool that guarantees convergence under mild conditions. Specifically, we introduce Hessian relative uniform continuity (HRUC), a regularity satisfied by nearly all standard kernels. Importantly, HRUC is closed under concatenation, positive scaling, composition, and various kernel combinations. Leveraging the geometric structure induced by HRUC, we derive convergence guarantees for mirror descent-based gradient tracking without imposing any restrictive assumptions. More broadly, our analysis techniques extend seamlessly to other decentralized optimization methods in genuinely non-Euclidean and non-Lipschitz settings.
Paper Structure (35 sections, 16 theorems, 124 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 16 theorems, 124 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.5

[lemma]lem:contraction Given as:matrix, and $\{u_i\}_{i\in[m]}$, $\{v_i\}_{i\in[m]}$, $\{v_i^+\}_{i\in[m]} \subset \mathbb{R}^d$. Let $\mathbf{u},\mathbf{v},\mathbf{v}^+\in\mathbb{R}^{m\times d}$ and $\bar{u}, \bar{v}, \bar{v}^+\in\mathbb{R}^d$ be their concatenated matrix and averaged vectors. Let Then,

Figures (4)

  • Figure 1: Convergence behavior of decentralized algorithms on Phase retrieval problem.
  • Figure 2: Performance of the decentralized methods on the Poisson inverse problem. Solid curves correspond to algorithms with random positive Gaussian initialization. The dashed curve reports convergence behavior of DDA under a favorable initialization.
  • Figure 3: The blurred image is the observation (from one of the agents) generated using motion blur with length $50$ for Cameraman and $100$ for Peppers, and corrupted by Poisson noise.
  • Figure 4: Recovered images (with PSNR in dB) after $T=2000$ iterations for Cameraman (top row) and Peppers (bottom row). The first four columns report reconstructions produced by DMD, DGT, DMGT and DDA under the observation-based initialization. For DDA, we additionally display the reconstruction obtained from a favorable initialization (dotted frame, last column).

Theorems & Definitions (32)

  • Lemma 2.5
  • Definition 3.1: HRUC Regularity
  • Proposition 3.2: Closedness under Concatenation
  • Proposition 3.3: Closedness under Composition
  • proof
  • Proposition 3.4: Closedness under Combination
  • proof : Proof of \ref{['proposition:Affine 1']}.
  • Example 3.5: Fermi-Dirac entropy
  • Lemma 3.6: Lipschitz in the distortion geometry
  • proof
  • ...and 22 more