An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization
Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari
TL;DR
This work tackles simple bilevel optimization with convex smooth upper and lower objectives by introducing AGM-BiO, an accelerated gradient method that uses a cutting-plane surrogate to approximate the lower-level solution set and a projection-based update for the upper level. It establishes non-asymptotic convergence guarantees, achieving suboptimality and infeasibility bounds that scale as O(max{1/√ε_f, 1/ε_g}) under compact feasibility and extendable to Hölderian error bounds with rates O(max{ε_f^{−(2r−1)/(2r)}, ε_g^{−(2r−1)/(2r)}}). Under weak sharpness (r=1), the method attains O(1/√ε_f) and O(1/√ε_g) absolute-optimal guarantees, aligning with optimal single-level rates. The paper also demonstrates practical effectiveness through numerical experiments, showing strong performance against several baselines, especially in high-dimensional settings, and discusses extensions to non-smooth/composite cases. Overall, AGM-BiO provides a principled, accelerated framework for convex simple bilevel problems with strong non-asymptotic guarantees and practical relevance.
Abstract
In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based update to reduce the upper-level objective function over the approximated solution set. We measure the performance of our method in terms of suboptimality and infeasibility errors and provide non-asymptotic convergence guarantees for both error criteria. Specifically, when the feasible set is compact, we show that our method requires at most $\mathcal{O}(\max\{1/\sqrt{ε_{f}}, 1/ε_g\})$ iterations to find a solution that is $ε_f$-suboptimal and $ε_g$-infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the $r$-th Hölderian error bound, we show that our method achieves an iteration complexity of $\mathcal{O}(\max\{ε_{f}^{-\frac{2r-1}{2r}},ε_{g}^{-\frac{2r-1}{2r}}\})$, which matches the optimal complexity of single-level convex constrained optimization when $r=1$.
