Table of Contents
Fetching ...

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Chris Junchi Li

TL;DR

This work addresses BLO by reformulating the problem into a hyper-objective $\varphi(x)$ and develops accelerated fully first-order methods under the LL strong convexity (LLSC) setting, notably the (Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation (PRAF${}^2$BA), which improve oracle complexities for finding $\epsilon$-first-order and second-order stationary points. It further shows that minimax optimization – a key BLO special case – can be accelerated via PRAF${}^2$BA to recover PRAGDA with state-of-the-art SOSP rates. For BLO without LLSC, the paper identifies regularity conditions (gradient dominance and weak sharp minimum) that guarantee tractability, proves hardness in general convex LL settings, and introduces IGFM with Switching Gradient Method as a polynomial-time approach to obtain UL stationary points. Theoretical results are complemented by empirical studies on hyperparameter optimization, data hypercleaning, and adversarial training, demonstrating the practical advantages of the proposed methods. Overall, the work broadens the computational toolkit for BLO by delivering non-asymptotic convergence guarantees across both constrained and unconstrained LL scenarios and clarifies when BLO is tractable without LL strong convexity.

Abstract

We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}^2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}^2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

TL;DR

This work addresses BLO by reformulating the problem into a hyper-objective and develops accelerated fully first-order methods under the LL strong convexity (LLSC) setting, notably the (Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation (PRAFBA), which improve oracle complexities for finding -first-order and second-order stationary points. It further shows that minimax optimization – a key BLO special case – can be accelerated via PRAFBA to recover PRAGDA with state-of-the-art SOSP rates. For BLO without LLSC, the paper identifies regularity conditions (gradient dominance and weak sharp minimum) that guarantee tractability, proves hardness in general convex LL settings, and introduces IGFM with Switching Gradient Method as a polynomial-time approach to obtain UL stationary points. Theoretical results are complemented by empirical studies on hyperparameter optimization, data hypercleaning, and adversarial training, demonstrating the practical advantages of the proposed methods. Overall, the work broadens the computational toolkit for BLO by delivering non-asymptotic convergence guarantees across both constrained and unconstrained LL scenarios and clarifies when BLO is tractable without LL strong convexity.

Abstract

We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAFBA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAFBA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.
Paper Structure (54 sections, 28 theorems, 108 equations, 4 figures, 4 tables, 5 algorithms)

This paper contains 54 sections, 28 theorems, 108 equations, 4 figures, 4 tables, 5 algorithms.

Key Result

Lemma 1

Suppose Assumption assu1 holds, then

Figures (4)

  • Figure 1: Comparison of a variety of bilevel algorithms on logistic regression on 20 Newsgroup dataset. Figures (a) and (b) depict the results of testing accuracy and testing loss vs. running time, respectively. Figures (c) and (d) depict the results of testing accuracy and testing loss vs. number of oracles calls, respectively.
  • Figure 2: Comparison of various bilevel algorithms for data hypercleaning at different corruption rates
  • Figure 3: W-shape function tripuraneni2018stochastic
  • Figure 4: A selection of empirical results with convergence measured by the function value gap, gradient norm and minimum eigenvalue of Hessian (in absolute value), applied on the task of synthetic minimax problem \ref{['eq13']}. The scale is in semi-log except for the absolute minimum Hessian eigenvalue.

Theorems & Definitions (65)

  • Definition 1: Approximate first-order stationary point
  • Definition 2: Approximate second-order stationary point
  • Lemma 1
  • Lemma 2
  • Lemma 3: Inexact gradients
  • Lemma 4
  • Lemma 5
  • Theorem 2: RAF${}^2$BA finding $\epsilon$-FOSP
  • Theorem 3: PRAF${}^2$BA finding $(\epsilon, \mathcal{O}(\kappa^{2.5} \sqrt{\epsilon}))$-SOSP
  • Lemma 6
  • ...and 55 more