Online Min-Max Optimization: From Individual Regrets to Cumulative Saddle Points
Abhijeet Vyas, Brian Bullins
TL;DR
The paper addresses online min-max optimization in settings beyond convex-concave, introducing two fresh performance notions, the static duality gap $ ext{SDual-Gap}_T$ and the dynamic saddle point regret $ ext{DSP-Reg}_T$, to capture progress toward a cumulative saddle point. It develops algorithms OGDA and OMMNS, with reductions to online convex optimization, and derives sublinear bounds for $ ext{SDual-Gap}_T$ and $ ext{DSP-Reg}_T$ under strong convexity-strong concavity and min-max exponential concavity, respectively; it also extends results to time-varying variational inequalities under a lower-regularity operator. A two-player portfolio-selection variant and a dynamic zero-sum game analysis under two-sided PL conditions illustrate practical implications and linear convergence guarantees, while a dynamic-regret framework in the sleeping-experts setting yields robust performance in non-stationary environments. Overall, the work provides a cohesive framework for tracking cumulative saddle points in online min-max problems, with convergence rates for averaged strategies and dynamic regret bounds that generalize and unify several online learning and VI results.
Abstract
We propose and study an online version of min-max optimization based on cumulative saddle points under a variety of performance measures beyond convex-concave settings. After first observing the incompatibility of (static) Nash equilibrium (SNE-Reg$_T$) with individual regrets even for strongly convex-strongly concave functions, we propose an alternate \emph{static} duality gap (SDual-Gap$_T$) inspired by the online convex optimization (OCO) framework. We provide algorithms that, using a reduction to classic OCO problems, achieve bounds for SDual-Gap$_T$~and a novel \emph{dynamic} saddle point regret (DSP-Reg$_T$), which we suggest naturally represents a min-max version of the dynamic regret in OCO. We derive our bounds for SDual-Gap$_T$~and DSP-Reg$_T$~under strong convexity-strong concavity and a min-max notion of exponential concavity (min-max EC), and in addition we establish a class of functions satisfying min-max EC~that captures a two-player variant of the classic portfolio selection problem. Finally, for a dynamic notion of regret compatible with individual regrets, we derive bounds under a two-sided Polyak-Łojasiewicz (PL) condition.
