Table of Contents
Fetching ...

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

Muhammad Khan, Yangyang Xu

TL;DR

This is the first unified framework that jointly accommodates weak convexity, the extended Kurdyka-Lojasiewicz (KL) property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

Abstract

In this paper, we study stochastic constrained minimax optimization problems with nonconvex-nonconcave structure, a central problem in modern machine learning, for which reliable and efficient algorithms remain largely unexplored due to its inherent challenges. Prior approaches for nonconvex minimax optimization often require (strong) concavity on the maximization part, or certain restrictive geometric assumptions on the joint objective to have guaranteed convergence. In contrast, our method only assumes weak convexity in the primal variable and the extended Kurdyka-Lojasiewicz (KL) property, with exponent $θ\in [0,1]$, in the dual variable, significantly broadening the class of tractable problems. To this end, we propose a variance reduced algorithm that provably handles this general setting and achieves an $\varepsilon$-stationary solution with state-of-the-art sample complexity: in the smooth finite-sum setting, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\varepsilon^{-\max\{4θ,2\}}\right)$, where $N$ is the number of total samples, and in the online smooth setting, it is $\mathcal{O}\Big(\varepsilon^{-\max\{6θ,3\}}\Big)$. For the structured nonsmooth problem, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\max\Big\{\varepsilon^{-3}, \varepsilon^{-5θ}, \varepsilon^{-\frac{11θ-3}{2θ}}\Big\}\right)$ and $\mathcal{O}\left(\max\left\{\varepsilon^{-4}, \varepsilon^{-\frac{15θ-1}{2}}, \varepsilon^{-\frac{31θ-9}{4θ}}\right\}\right)$ respectively for the two settings. To the best of our knowledge, this is the first unified framework that jointly accommodates weak convexity, the extended KL property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property

TL;DR

This is the first unified framework that jointly accommodates weak convexity, the extended Kurdyka-Lojasiewicz (KL) property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.

Abstract

In this paper, we study stochastic constrained minimax optimization problems with nonconvex-nonconcave structure, a central problem in modern machine learning, for which reliable and efficient algorithms remain largely unexplored due to its inherent challenges. Prior approaches for nonconvex minimax optimization often require (strong) concavity on the maximization part, or certain restrictive geometric assumptions on the joint objective to have guaranteed convergence. In contrast, our method only assumes weak convexity in the primal variable and the extended Kurdyka-Lojasiewicz (KL) property, with exponent , in the dual variable, significantly broadening the class of tractable problems. To this end, we propose a variance reduced algorithm that provably handles this general setting and achieves an -stationary solution with state-of-the-art sample complexity: in the smooth finite-sum setting, the sample complexity is , where is the number of total samples, and in the online smooth setting, it is . For the structured nonsmooth problem, the sample complexity is and respectively for the two settings. To the best of our knowledge, this is the first unified framework that jointly accommodates weak convexity, the extended KL property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.
Paper Structure (20 sections, 26 theorems, 200 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 20 sections, 26 theorems, 200 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Theorem 2.4

Under Assumptions assumption_for_smooth_case and assumption_KL_for_smooth_case, let $\varepsilon > 0$ be given. Suppose $\rho = \mathcal{O}( \min\{L_x, L_y\} )$ and $\min\{L_x, L_y\} = \Omega(1)$ and denote ${\Delta\Phi = (\Phi_r(x^{0}_{0}, y^{0}_{0}, z_{0}^{0})-\underline{F})}$, where $\underline{F In addition, take $T=M=\left\lceil \sqrt{\frac{B}{2}}\,\right\rceil$, choose $\alpha_x=\min\{ \fra

Figures (1)

  • Figure 1: Illustration of the nonconcave function defined in \ref{['eq:example-KL']}.

Theorems & Definitions (55)

  • Definition 1.1
  • Definition 1.2
  • Remark 2.3
  • Theorem 2.4: Iteration and sample complexity for smooth case
  • Remark 2.5
  • Remark 3.3
  • Theorem 3.4: Iteration and sample complexity for nonsmooth case
  • Remark 3.5
  • Lemma 1.1: Error Bounds for Gradient Estimators
  • proof
  • ...and 45 more