Diffusion Stochastic Optimization for Min-Max Problems

Haoyuan Cai; Sulaiman A. Alghunaim; Ali H. Sayed

Diffusion Stochastic Optimization for Min-Max Problems

Haoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed

TL;DR

This work proposes Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG) to tackle minimax optimization with nonconvex-PL objectives in distributed settings, addressing the large-batch bottleneck of traditional stochastic OG methods. It develops a centralized SS-OG variant and a distributed diffusion implementation over left-stochastic networks, establishing convergence with a primal rate of $\mathcal{O}(1/\sqrt{T})$ and a dual rate of $\mathcal{O}(1/T)$; the overall complexity to obtain an $\varepsilon$-stationary point scales as $T_0+T_1=\mathcal{O}(\varepsilon^{-4})+\mathcal{O}(\varepsilon^{-2})$. The analysis relies on relaxed Lipschitz and PL assumptions, without requiring gradient-smoothness of the stochastic losses, and demonstrates that gradient queries can be executed in parallel with modest memory overhead. Empirical validation on Wasserstein GAN and DCGAN tasks shows that DSS-OG and its centralized variant outperform several baselines in gradient norms, MSE, and FID scores, confirming practical effectiveness for distributed minimax learning. Overall, the paper advances distributed minimax optimization by delivering a batch-flexible, diffusion-based algorithm with provable convergence and real-world GAN demonstrations.

Abstract

The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon^{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.

Diffusion Stochastic Optimization for Min-Max Problems

TL;DR

and a dual rate of

; the overall complexity to obtain an

-stationary point scales as

. The analysis relies on relaxed Lipschitz and PL assumptions, without requiring gradient-smoothness of the stochastic losses, and demonstrates that gradient queries can be executed in parallel with modest memory overhead. Empirical validation on Wasserstein GAN and DCGAN tasks shows that DSS-OG and its centralized variant outperform several baselines in gradient norms, MSE, and FID scores, confirming practical effectiveness for distributed minimax learning. Overall, the paper advances distributed minimax optimization by delivering a batch-flexible, diffusion-based algorithm with provable convergence and real-world GAN demonstrations.

Abstract

to achieve an

-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.

Paper Structure (25 sections, 11 theorems, 103 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 11 theorems, 103 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Related works
Main Contributions
Algorithm Description
Centralized Scenario
Distributed scenario
Convergence Analysis
Assumptions
Main Results
Comparison between S-OG and SS-OG
Computer simulation
Wasserstein GAN
Deep Convolutional GANs
Conclusion
Basic Lemmas
...and 10 more sections

Key Result

Theorem 1

Under Assumptions muPL--LeftStochastic, choosing step sizes the non-asymptotic convergence rate of the primal objective is given by: where $L =L_f + \frac{L^2_f}{\nu}$ is the Lipschitz constant associated with $P(x)$ and are constants. Furthermore, DSS-OG outputs a primal $\varepsilon$-stationary point after $T_0 = \mathcal{O}(\varepsilon^{-4})$ iterations and gradient evaluation complexity, i.

Figures (2)

Figure 1: Evoluation of gradient norm and mean square error distance between the true and estimated ones: In (a) and (b), the true model is given by $\pi_k =0, \sigma^2_k = 0.001$ for all $k$; In (c) and (d), the true model is given by $\pi_k =0, \sigma^2_k = 0.1$ for all $k$.
Figure 2: Simulation results for DCGANs. In (a), the FID score versus different hyperparameters is demonstrated. In (b), we show a sample image generated by a single node of Adam-DSS-OG ($\mu = 0.001, \beta_1 = 0.2, \beta_2 = 0.998$).

Theorems & Definitions (12)

Remark 1
Theorem 1
Theorem 2
Theorem 3
Lemma 1: Quadratic Growth karimi2016linear
Lemma 2: Danskin-type Lemma nouiehed2019solving
Lemma 3
Lemma 4
Lemma 5
Lemma 6
...and 2 more

Diffusion Stochastic Optimization for Min-Max Problems

TL;DR

Abstract

Diffusion Stochastic Optimization for Min-Max Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)