Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization
Chris Junchi Li
TL;DR
This work addresses BLO by reformulating the problem into a hyper-objective $\varphi(x)$ and develops accelerated fully first-order methods under the LL strong convexity (LLSC) setting, notably the (Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation (PRAF${}^2$BA), which improve oracle complexities for finding $\epsilon$-first-order and second-order stationary points. It further shows that minimax optimization – a key BLO special case – can be accelerated via PRAF${}^2$BA to recover PRAGDA with state-of-the-art SOSP rates. For BLO without LLSC, the paper identifies regularity conditions (gradient dominance and weak sharp minimum) that guarantee tractability, proves hardness in general convex LL settings, and introduces IGFM with Switching Gradient Method as a polynomial-time approach to obtain UL stationary points. Theoretical results are complemented by empirical studies on hyperparameter optimization, data hypercleaning, and adversarial training, demonstrating the practical advantages of the proposed methods. Overall, the work broadens the computational toolkit for BLO by delivering non-asymptotic convergence guarantees across both constrained and unconstrained LL scenarios and clarifies when BLO is tractable without LL strong convexity.
Abstract
We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}^2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}^2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.
