Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Chris Junchi Li

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Chris Junchi Li

TL;DR

This work addresses BLO by reformulating the problem into a hyper-objective $\varphi(x)$ and develops accelerated fully first-order methods under the LL strong convexity (LLSC) setting, notably the (Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation (PRAF${}^2$BA), which improve oracle complexities for finding $\epsilon$-first-order and second-order stationary points. It further shows that minimax optimization – a key BLO special case – can be accelerated via PRAF${}^2$BA to recover PRAGDA with state-of-the-art SOSP rates. For BLO without LLSC, the paper identifies regularity conditions (gradient dominance and weak sharp minimum) that guarantee tractability, proves hardness in general convex LL settings, and introduces IGFM with Switching Gradient Method as a polynomial-time approach to obtain UL stationary points. Theoretical results are complemented by empirical studies on hyperparameter optimization, data hypercleaning, and adversarial training, demonstrating the practical advantages of the proposed methods. Overall, the work broadens the computational toolkit for BLO by delivering non-asymptotic convergence guarantees across both constrained and unconstrained LL scenarios and clarifies when BLO is tractable without LL strong convexity.

Abstract

We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}^2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}^2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

TL;DR

This work addresses BLO by reformulating the problem into a hyper-objective

and develops accelerated fully first-order methods under the LL strong convexity (LLSC) setting, notably the (Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation (PRAF

BA), which improve oracle complexities for finding

-first-order and second-order stationary points. It further shows that minimax optimization – a key BLO special case – can be accelerated via PRAF

BA to recover PRAGDA with state-of-the-art SOSP rates. For BLO without LLSC, the paper identifies regularity conditions (gradient dominance and weak sharp minimum) that guarantee tractability, proves hardness in general convex LL settings, and introduces IGFM with Switching Gradient Method as a polynomial-time approach to obtain UL stationary points. Theoretical results are complemented by empirical studies on hyperparameter optimization, data hypercleaning, and adversarial training, demonstrating the practical advantages of the proposed methods. Overall, the work broadens the computational toolkit for BLO by delivering non-asymptotic convergence guarantees across both constrained and unconstrained LL scenarios and clarifies when BLO is tractable without LL strong convexity.

Abstract

BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF

BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.

Paper Structure (54 sections, 28 theorems, 108 equations, 4 figures, 4 tables, 5 algorithms)

This paper contains 54 sections, 28 theorems, 108 equations, 4 figures, 4 tables, 5 algorithms.

Introduction
BLO with LLSC.
Minimax Optimization.
BLO without LLSC.
Contributions
Related Works
Notation.
Accelerated Fully First-Order Bilevel Optimization with LLSC
Technical Preliminaries
Theoretical Guarantees for Accelerated Fully First-Order Methods
Proof of Theorem \ref{['theo2']}
Proof of Theorem \ref{['theo3']}
PRAF${}^2$BA for Accelerating NCSC Minimax Optimization
Connection between PRAF${}^2$BA and the perturbed restarted accelerated gradient descent ascent.
Optimality and Stationarity in Bilevel Optimization without LLSC
...and 39 more sections

Key Result

Lemma 1

Suppose Assumption assu1 holds, then

Figures (4)

Figure 1: Comparison of a variety of bilevel algorithms on logistic regression on 20 Newsgroup dataset. Figures (a) and (b) depict the results of testing accuracy and testing loss vs. running time, respectively. Figures (c) and (d) depict the results of testing accuracy and testing loss vs. number of oracles calls, respectively.
Figure 2: Comparison of various bilevel algorithms for data hypercleaning at different corruption rates
Figure 3: W-shape function tripuraneni2018stochastic
Figure 4: A selection of empirical results with convergence measured by the function value gap, gradient norm and minimum eigenvalue of Hessian (in absolute value), applied on the task of synthetic minimax problem \ref{['eq13']}. The scale is in semi-log except for the absolute minimum Hessian eigenvalue.

Theorems & Definitions (65)

Definition 1: Approximate first-order stationary point
Definition 2: Approximate second-order stationary point
Lemma 1
Lemma 2
Lemma 3: Inexact gradients
Lemma 4
Lemma 5
Theorem 2: RAF${}^2$BA finding $\epsilon$-FOSP
Theorem 3: PRAF${}^2$BA finding $(\epsilon, \mathcal{O}(\kappa^{2.5} \sqrt{\epsilon}))$-SOSP
Lemma 6
...and 55 more

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

TL;DR

Abstract

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (65)