Table of Contents
Fetching ...

Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum

Haoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed

TL;DR

This work tackles stochastic minimax optimization where the minimization variable is nonconvex and the maximization variable is strongly concave or satisfies the Polyak–Lojasiewicz condition. It introduces two bias-corrected momentum methods (HCMM-1 and HCMM-2) that leverage Hessian-vector products to form more accurate momentum estimates, using a two-time-scale gradient descent-ascent scheme and cross-variable Hessian terms. Under Lipschitz Hessian assumptions, they establish $O(\varepsilon^{-3})$ iteration complexity with $O(1)$ batch size, and demonstrate convergence for both nonconvex–strongly-concave and nonconvex–PL settings, backed by a rigorous potential-function analysis. Empirical results on distributionally robust logistic regression with real datasets show that HCMM variants outperform prior bias-corrected momentum methods in both speed and robustness, with HCMM-2 offering faster convergence and HCMM-1 providing greater resilience to outliers.

Abstract

Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of $\mathcal{O}(\varepsilon^{-3})$ for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.

Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum

TL;DR

This work tackles stochastic minimax optimization where the minimization variable is nonconvex and the maximization variable is strongly concave or satisfies the Polyak–Lojasiewicz condition. It introduces two bias-corrected momentum methods (HCMM-1 and HCMM-2) that leverage Hessian-vector products to form more accurate momentum estimates, using a two-time-scale gradient descent-ascent scheme and cross-variable Hessian terms. Under Lipschitz Hessian assumptions, they establish iteration complexity with batch size, and demonstrate convergence for both nonconvex–strongly-concave and nonconvex–PL settings, backed by a rigorous potential-function analysis. Empirical results on distributionally robust logistic regression with real datasets show that HCMM variants outperform prior bias-corrected momentum methods in both speed and robustness, with HCMM-2 offering faster convergence and HCMM-1 providing greater resilience to outliers.

Abstract

Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least oracle complexity to find an -stationary point. Some works indicate that this complexity can be improved to when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.
Paper Structure (21 sections, 16 theorems, 150 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 16 theorems, 150 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let Assumptions unbiased, NC_SC and boundPhi--BoundedGradient hold. The stability condition for the hyperparameters in HCMM-1 is given by where $\kappa = \frac{L_f}{\nu}, L_1 =L_f +\kappa L_f$, while $C, \pi_1$ are constants given by We choose the smoothing factors as $\beta_x = \beta_y = \mathcal{O}(\frac{1}{T^{2/3}})$, and $\mu_x = c_1\sqrt{\beta_x}, \mu_y = c_2\sqrt{\beta_y}$ for some small

Figures (2)

  • Figure 1: The figures, arranged from top to bottom and left to right, represent the results on the datasets "mushrooms", "phishing", "ijcnn1", "a9a", and "w8a", respectively. These figures illustrate the worst-case risk value $P(x)$ versus the number of iterations.
  • Figure 2: Comparison of algorithms HCMM-1 and HCMM-2 under synthesized data in the presence of outliers. In the left figure, the algorithms are trained on linearly separable data. In the right figure, $10\%$ of the training data is comprised of the synthesized outliers.

Theorems & Definitions (17)

  • Theorem 1: HCMM-1 convergence
  • Corollary 1
  • Theorem 2: HCMM-2 convergence
  • Remark 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 7 more