SDEs for Minimax Optimization
Enea Monzio Compagnoni, Antonio Orvieto, Hans Kersting, Frank Norbert Proske, Aurelien Lucchi
TL;DR
The paper provides a formal stochastic-differential-equation (SDE) framework for analyzing minimax optimizers, deriving SDE models for SGDA, SEG, and SHGD as weak approximations of their discrete updates. It reveals how hyperparameters, such as the SEG extra stepsize ρ, interact with gradient noise and landscape curvature to produce implicit regularization and curvature-induced diffusion, enabling unified Itô-calculus based analyses of convergence and dynamic behavior. The study identifies regimes where SEG behaves like SGDA, and where curvature-aware SHGD introduces explicit curvature-driven noise, plus exact dynamics for quadratic games that illustrate a trade-off between convergence speed and asymptotic accuracy. Empirical validation confirms the SDEs capture key trajectories and variance properties across landscapes, and the work provides concrete convergence conditions and scheduler designs to ensure convergence. Overall, the framework offers a principled, analyzable lens to compare minimax optimizers and informs design choices for robust stochastic optimization in complex games.
Abstract
Minimax optimization problems have attracted a lot of attention over the past few years, with applications ranging from economics to machine learning. While advanced optimization methods exist for such problems, characterizing their dynamics in stochastic scenarios remains notably challenging. In this paper, we pioneer the use of stochastic differential equations (SDEs) to analyze and compare Minimax optimizers. Our SDE models for Stochastic Gradient Descent-Ascent, Stochastic Extragradient, and Stochastic Hamiltonian Gradient Descent are provable approximations of their algorithmic counterparts, clearly showcasing the interplay between hyperparameters, implicit regularization, and implicit curvature-induced noise. This perspective also allows for a unified and simplified analysis strategy based on the principles of Itô calculus. Finally, our approach facilitates the derivation of convergence conditions and closed-form solutions for the dynamics in simplified settings, unveiling further insights into the behavior of different optimizers.
