Table of Contents
Fetching ...

Solving bilevel optimization via sequential minimax optimization

Zhaosong Lu, Sanyou Mei

TL;DR

The paper tackles constrained bilevel optimization where the lower level is convex (possibly nonsmooth) and the upper level is nonconvex. It introduces sequential minimax optimization (SMO), which solves a sequence of minimax subproblems derived from a modified augmented-Lagrangian formulation, using only first-order information and proximal computations. The main theoretical contribution is sharp operation-complexity guarantees: SMO attains an ε-KKT solution in ${\cal O}(\varepsilon^{-7}\log \varepsilon^{-1})$ operations when the lower level is merely convex and ${\cal O}(\varepsilon^{-6}\log \varepsilon^{-1})$ when the lower level is strongly convex, improving prior results by a factor of $\varepsilon^{-1}$. Empirical results on constrained bilevel linear/quadratic problems and SVM hyperparameter tuning show SMO consistently outperforms a state-of-the-art first-order penalty method in runtime while delivering competitive solution quality.

Abstract

In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth convex optimization problem, while the upper-level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first-order method to solve a sequence of minimax subproblems, which are obtained by employing a hybrid of modified augmented Lagrangian and penalty schemes on the bilevel optimization problems. Under suitable assumptions, we establish an operation complexity of $O(\varepsilon^{-7}\log\varepsilon^{-1})$ and $O(\varepsilon^{-6}\log\varepsilon^{-1})$, measured in terms of fundamental operations, for SMO in finding an $\varepsilon$-KKT solution of the bilevel optimization problems with merely convex and strongly convex lower-level objective functions, respectively. The latter result improves the previous best-known operation complexity by a factor of $\varepsilon^{-1}$. Preliminary numerical results demonstrate significantly superior computational performance compared to the recently developed first-order penalty method.

Solving bilevel optimization via sequential minimax optimization

TL;DR

The paper tackles constrained bilevel optimization where the lower level is convex (possibly nonsmooth) and the upper level is nonconvex. It introduces sequential minimax optimization (SMO), which solves a sequence of minimax subproblems derived from a modified augmented-Lagrangian formulation, using only first-order information and proximal computations. The main theoretical contribution is sharp operation-complexity guarantees: SMO attains an ε-KKT solution in operations when the lower level is merely convex and when the lower level is strongly convex, improving prior results by a factor of . Empirical results on constrained bilevel linear/quadratic problems and SVM hyperparameter tuning show SMO consistently outperforms a state-of-the-art first-order penalty method in runtime while delivering competitive solution quality.

Abstract

In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth convex optimization problem, while the upper-level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first-order method to solve a sequence of minimax subproblems, which are obtained by employing a hybrid of modified augmented Lagrangian and penalty schemes on the bilevel optimization problems. Under suitable assumptions, we establish an operation complexity of and , measured in terms of fundamental operations, for SMO in finding an -KKT solution of the bilevel optimization problems with merely convex and strongly convex lower-level objective functions, respectively. The latter result improves the previous best-known operation complexity by a factor of . Preliminary numerical results demonstrate significantly superior computational performance compared to the recently developed first-order penalty method.

Paper Structure

This paper contains 11 sections, 15 theorems, 125 equations, 3 tables, 5 algorithms.

Key Result

Theorem 1

Suppose that Assumptions a1 and a2 hold with $\sigma=0$, i.e., ${\tilde{f}}_1(x,\cdot)$ being convex but not strongly convex for any given $x\in\mathrm{dom}\,f_2$. Let $\{(x^k,y^k,z^k,\lambda^k)\}_{k\in\mathbb{K}}$ be generated by Algorithm AL-alg, $f^*$, ${\tilde{f}}^*_{\rm hi}$, $D_{\rm \bf x}$, $ Suppose that $\varepsilon^{-2}-8\tau^{-3}G^{-2}\vartheta\geq0$. Then the following statements hold.

Theorems & Definitions (32)

  • Definition 1
  • Remark 1
  • Definition 2: KKT solution and $\epsilon$-KKT solution
  • Theorem 1: iteration and operation complexity of Algorithm \ref{['AL-alg']} for problem \ref{['prob']} with $\sigma=0$
  • Remark 2
  • Theorem 2: iteration and operation complexity of Algorithm \ref{['AL-alg']} for problem \ref{['prob']} with $\sigma>0$
  • Remark 3
  • Lemma 1: lu2024first-bilevel
  • Lemma 2
  • proof
  • ...and 22 more