Diffusion Stochastic Optimization for Min-Max Problems
Haoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed
TL;DR
This work proposes Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG) to tackle minimax optimization with nonconvex-PL objectives in distributed settings, addressing the large-batch bottleneck of traditional stochastic OG methods. It develops a centralized SS-OG variant and a distributed diffusion implementation over left-stochastic networks, establishing convergence with a primal rate of $\mathcal{O}(1/\sqrt{T})$ and a dual rate of $\mathcal{O}(1/T)$; the overall complexity to obtain an $\varepsilon$-stationary point scales as $T_0+T_1=\mathcal{O}(\varepsilon^{-4})+\mathcal{O}(\varepsilon^{-2})$. The analysis relies on relaxed Lipschitz and PL assumptions, without requiring gradient-smoothness of the stochastic losses, and demonstrates that gradient queries can be executed in parallel with modest memory overhead. Empirical validation on Wasserstein GAN and DCGAN tasks shows that DSS-OG and its centralized variant outperform several baselines in gradient norms, MSE, and FID scores, confirming practical effectiveness for distributed minimax learning. Overall, the paper advances distributed minimax optimization by delivering a batch-flexible, diffusion-based algorithm with provable convergence and real-world GAN demonstrations.
Abstract
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon^{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.
