A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

Muhammad Khan; Yangyang Xu

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

Muhammad Khan, Yangyang Xu

TL;DR

This is the first unified framework that jointly accommodates weak convexity, the extended Kurdyka-Lojasiewicz (KL) property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

Abstract

In this paper, we study stochastic constrained minimax optimization problems with nonconvex-nonconcave structure, a central problem in modern machine learning, for which reliable and efficient algorithms remain largely unexplored due to its inherent challenges. Prior approaches for nonconvex minimax optimization often require (strong) concavity on the maximization part, or certain restrictive geometric assumptions on the joint objective to have guaranteed convergence. In contrast, our method only assumes weak convexity in the primal variable and the extended Kurdyka-Lojasiewicz (KL) property, with exponent $θ\in [0,1]$, in the dual variable, significantly broadening the class of tractable problems. To this end, we propose a variance reduced algorithm that provably handles this general setting and achieves an $\varepsilon$-stationary solution with state-of-the-art sample complexity: in the smooth finite-sum setting, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\varepsilon^{-\max\{4θ,2\}}\right)$, where $N$ is the number of total samples, and in the online smooth setting, it is $\mathcal{O}\Big(\varepsilon^{-\max\{6θ,3\}}\Big)$. For the structured nonsmooth problem, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\max\Big\{\varepsilon^{-3}, \varepsilon^{-5θ}, \varepsilon^{-\frac{11θ-3}{2θ}}\Big\}\right)$ and $\mathcal{O}\left(\max\left\{\varepsilon^{-4}, \varepsilon^{-\frac{15θ-1}{2}}, \varepsilon^{-\frac{31θ-9}{4θ}}\right\}\right)$ respectively for the two settings. To the best of our knowledge, this is the first unified framework that jointly accommodates weak convexity, the extended KL property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

TL;DR

Abstract

, in the dual variable, significantly broadening the class of tractable problems. To this end, we propose a variance reduced algorithm that provably handles this general setting and achieves an

-stationary solution with state-of-the-art sample complexity: in the smooth finite-sum setting, the sample complexity is

, where

is the number of total samples, and in the online smooth setting, it is

. For the structured nonsmooth problem, the sample complexity is

and

respectively for the two settings. To the best of our knowledge, this is the first unified framework that jointly accommodates weak convexity, the extended KL property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

Paper Structure (20 sections, 26 theorems, 200 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 20 sections, 26 theorems, 200 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Related Works
Contributions
Notations and Definitions
Constrained Smooth Minimax Problems
Algorithm
Roadmap of Convergence Analysis
Complexity Results for Smooth Problems
Composite Nonsmooth Minimax Problems
Motivating Applications
Convergence Results
Smoothing by Moreau Envelope and Key Properties
Complexity Results -- Nonsmooth regime
Concluding Remarks
Proofs for the smooth case
...and 5 more sections

Key Result

Theorem 2.4

Under Assumptions assumption_for_smooth_case and assumption_KL_for_smooth_case, let $\varepsilon > 0$ be given. Suppose $\rho = \mathcal{O}( \min\{L_x, L_y\} )$ and $\min\{L_x, L_y\} = \Omega(1)$ and denote ${\Delta\Phi = (\Phi_r(x^{0}_{0}, y^{0}_{0}, z_{0}^{0})-\underline{F})}$, where $\underline{F In addition, take $T=M=\left\lceil \sqrt{\frac{B}{2}}\,\right\rceil$, choose $\alpha_x=\min\{ \fra

Figures (1)

Figure 1: Illustration of the nonconcave function defined in \ref{['eq:example-KL']}.

Theorems & Definitions (55)

Definition 1.1
Definition 1.2
Remark 2.3
Theorem 2.4: Iteration and sample complexity for smooth case
Remark 2.5
Remark 3.3
Theorem 3.4: Iteration and sample complexity for nonsmooth case
Remark 3.5
Lemma 1.1: Error Bounds for Gradient Estimators
proof
...and 45 more

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

TL;DR

Abstract

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (55)