A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Junwen Qiu; Ziyang Zeng; Leilei Mei; Junyu Zhang

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Junwen Qiu, Ziyang Zeng, Leilei Mei, Junyu Zhang

Abstract

Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothness and (ii) bi-convexity of the associated Bregman divergence function. Unfortunately, these conditions are violated by nearly all kernels used in practice, leaving a huge theory-practice gap. This work closes this gap by developing a unified analytical tool that guarantees convergence under mild conditions. Specifically, we introduce Hessian relative uniform continuity (HRUC), a regularity satisfied by nearly all standard kernels. Importantly, HRUC is closed under concatenation, positive scaling, composition, and various kernel combinations. Leveraging the geometric structure induced by HRUC, we derive convergence guarantees for mirror descent-based gradient tracking without imposing any restrictive assumptions. More broadly, our analysis techniques extend seamlessly to other decentralized optimization methods in genuinely non-Euclidean and non-Lipschitz settings.

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Abstract

Paper Structure (35 sections, 16 theorems, 124 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 16 theorems, 124 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Extended literature review
Notations
Preliminaries and basic assumptions
Mirror descent and relative smoothness
Consensus mixing
The Hessian relative uniform continuity regularity condition
Popular kernels are HRUC regular
Separable kernels
Self-concordant strongly convex functions
Power kernels
Algorithm and convergence analysis under HRUC
Numerical experiments
Baseline algorithms
Performance evaluations and parameters tuning
...and 20 more sections

Key Result

Lemma 2.5

[lemma]lem:contraction Given as:matrix, and $\{u_i\}_{i\in[m]}$, $\{v_i\}_{i\in[m]}$, $\{v_i^+\}_{i\in[m]} \subset \mathbb{R}^d$. Let $\mathbf{u},\mathbf{v},\mathbf{v}^+\in\mathbb{R}^{m\times d}$ and $\bar{u}, \bar{v}, \bar{v}^+\in\mathbb{R}^d$ be their concatenated matrix and averaged vectors. Let Then,

Figures (4)

Figure 1: Convergence behavior of decentralized algorithms on Phase retrieval problem.
Figure 2: Performance of the decentralized methods on the Poisson inverse problem. Solid curves correspond to algorithms with random positive Gaussian initialization. The dashed curve reports convergence behavior of DDA under a favorable initialization.
Figure 3: The blurred image is the observation (from one of the agents) generated using motion blur with length $50$ for Cameraman and $100$ for Peppers, and corrupted by Poisson noise.
Figure 4: Recovered images (with PSNR in dB) after $T=2000$ iterations for Cameraman (top row) and Peppers (bottom row). The first four columns report reconstructions produced by DMD, DGT, DMGT and DDA under the observation-based initialization. For DDA, we additionally display the reconstruction obtained from a favorable initialization (dotted frame, last column).

Theorems & Definitions (32)

Lemma 2.5
Definition 3.1: HRUC Regularity
Proposition 3.2: Closedness under Concatenation
Proposition 3.3: Closedness under Composition
proof
Proposition 3.4: Closedness under Combination
proof : Proof of \ref{['proposition:Affine 1']}.
Example 3.5: Fermi-Dirac entropy
Lemma 3.6: Lipschitz in the distortion geometry
proof
...and 22 more

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Abstract

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (32)