An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Jincheng Cao; Ruichen Jiang; Erfan Yazdandoost Hamedani; Aryan Mokhtari

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

TL;DR

This work tackles simple bilevel optimization with convex smooth upper and lower objectives by introducing AGM-BiO, an accelerated gradient method that uses a cutting-plane surrogate to approximate the lower-level solution set and a projection-based update for the upper level. It establishes non-asymptotic convergence guarantees, achieving suboptimality and infeasibility bounds that scale as O(max{1/√ε_f, 1/ε_g}) under compact feasibility and extendable to Hölderian error bounds with rates O(max{ε_f^{−(2r−1)/(2r)}, ε_g^{−(2r−1)/(2r)}}). Under weak sharpness (r=1), the method attains O(1/√ε_f) and O(1/√ε_g) absolute-optimal guarantees, aligning with optimal single-level rates. The paper also demonstrates practical effectiveness through numerical experiments, showing strong performance against several baselines, especially in high-dimensional settings, and discusses extensions to non-smooth/composite cases. Overall, AGM-BiO provides a principled, accelerated framework for convex simple bilevel problems with strong non-asymptotic guarantees and practical relevance.

Abstract

In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based update to reduce the upper-level objective function over the approximated solution set. We measure the performance of our method in terms of suboptimality and infeasibility errors and provide non-asymptotic convergence guarantees for both error criteria. Specifically, when the feasible set is compact, we show that our method requires at most $\mathcal{O}(\max\{1/\sqrt{ε_{f}}, 1/ε_g\})$ iterations to find a solution that is $ε_f$-suboptimal and $ε_g$-infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the $r$-th Hölderian error bound, we show that our method achieves an iteration complexity of $\mathcal{O}(\max\{ε_{f}^{-\frac{2r-1}{2r}},ε_{g}^{-\frac{2r-1}{2r}}\})$, which matches the optimal complexity of single-level convex constrained optimization when $r=1$.

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

TL;DR

Abstract

iterations to find a solution that is

-suboptimal and

-infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the

-th Hölderian error bound, we show that our method achieves an iteration complexity of

, which matches the optimal complexity of single-level convex constrained optimization when

Paper Structure (18 sections, 7 theorems, 86 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 18 sections, 7 theorems, 86 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Preliminaries
Assumptions and Definitions
Algorithm
Convergence Analysis
Convergence under Hölderian Error Bound
Numerical Experiments
Conclusion
Proof of the Main Results
Proof of Theorem \ref{['thm:upper_lower']}
Proof of Lemma \ref{['lm:weighted_sum']}
Proof of Theorem \ref{['thm:upper_lower_hd']}
Proof of Theorem \ref{['pp:upper_lower_ws']}
Extension to the Non-smooth/Composite Setting
Connection with the Polyak Step Size
...and 3 more sections

Key Result

Theorem 4.1

Suppose Assumption ass:1 holds. Let $\{\mathbf{x}_k\}_{k\geq 0}$ be the sequence of iterates generated by Algorithm alg:AGM-BiO with stepsize $a_k = (k+1)/(4L_f)$ for $k \geq 0$ and suppose the sequence $g_k$ used for generating the cutting plane satisfies eq:convergence_g. Then, for any $k\geq 0$ w

Figures (2)

Figure 1: Comparison of a-IRG, CG-BiO, Bi-SG, SEA, R-APM, PB-APG, and AGM-BiO for solving the over-parameterized regression problem.
Figure 2: Comparison of a-IRG, Bi-SG, SEA, R-APM, PB-APG, Bisec-BiO, and AGM-BiO for solving the linear inverse problem.

Theorems & Definitions (20)

Definition 2.1
Definition 2.2
Remark 3.1
Remark 3.2
Theorem 4.1
Remark 4.1: The necessity of compactness of $\mathcal{Z}$
Remark 4.2: Removable $\log$ terms
Proposition 4.2: jiangconditional
Lemma 4.3
Theorem 4.4
...and 10 more

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

TL;DR

Abstract

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (20)