Table of Contents
Fetching ...

Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

Yuyang Deng, Fuli Qiao, Mehrdad Mahdavi

TL;DR

The paper addresses stochastic compositional minimax optimization where inner and outer functions are composed over primal, dual, or both variables. It introduces CODA, a descent-ascent framework with compositional correction, and variants CODA-Primal, CODA-Dual, and CODA-PD, plus CODA+ with variance reduction. The authors establish convergence guarantees across nonconvex-strongly-concave, nonconvex-concave, strongly-convex-nonconcave, convex-nonconcave, and weakly-convex-weakly-concave settings, achieving state-of-the-art rates in several regimes. They validate the theory with experiments on AUC optimization, task-robust MAML, and multi-source domain adaptation, showing consistent improvements over baselines. The work lays a foundation for theoretical study of stochastic compositional minimax in diverse ML applications and offers practical algorithms with competitive convergence properties.

Abstract

Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning.

Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

TL;DR

The paper addresses stochastic compositional minimax optimization where inner and outer functions are composed over primal, dual, or both variables. It introduces CODA, a descent-ascent framework with compositional correction, and variants CODA-Primal, CODA-Dual, and CODA-PD, plus CODA+ with variance reduction. The authors establish convergence guarantees across nonconvex-strongly-concave, nonconvex-concave, strongly-convex-nonconcave, convex-nonconcave, and weakly-convex-weakly-concave settings, achieving state-of-the-art rates in several regimes. They validate the theory with experiments on AUC optimization, task-robust MAML, and multi-source domain adaptation, showing consistent improvements over baselines. The work lays a foundation for theoretical study of stochastic compositional minimax in diverse ML applications and offers practical algorithms with competitive convergence properties.

Abstract

Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning.
Paper Structure (44 sections, 38 theorems, 341 equations, 6 figures, 12 tables)

This paper contains 44 sections, 38 theorems, 341 equations, 6 figures, 12 tables.

Key Result

Theorem 1

Under Assumptions ass:ncsc, ass:bounded Y, ass:primal standard, defining $\kappa:= L/\mu$, for Algorithm algorithm: CODA-Primal, if we choose $\delta = \frac{1}{\kappa}$, $M=B = \Theta\left( \max\left\{ \frac{\kappa^2 L \sigma^2}{\epsilon^2},1 \right\} \right)$, $\beta = \frac{1}{2}$, $\eta_{\ma where $\Delta_{\Phi}:=\Phi(\mathbf{x}^0) - \min_{\mathbf{x}\in\mathcal{X}}\Phi(\mathbf{x})$.

Figures (6)

  • Figure 1: Convergence curves of CODA-Primal on AUC maximization task on four benchmark datasets with different imbalance ratios
  • Figure 2: Testing AUC comparison of CODA and SCGDA on four benchmarks under imratio$=10\%$
  • Figure 3: Training loss with different imbalance ratios on four benchmark datasets
  • Figure 4: Testing AUC performance comparison of CODA and SCGDA on four benchmarks. The first row is under imratio$=1\%$ and the third row is under imratio$=30\%$
  • Figure 5: Meta-Training, Meta-Validation, and Meta-Testing accuracy over epochs for different task sizes ($K=5$ and $K=20$)
  • ...and 1 more figures

Theorems & Definitions (72)

  • Definition 1
  • Theorem 1
  • Definition 2: Moreau Envelope
  • Theorem 2
  • Definition 3: Convergence Measure xu2023unified
  • Theorem 3
  • Theorem 4
  • Lemma 1
  • Definition 4: Convergence Measure liu2021first
  • Theorem 5
  • ...and 62 more