First-order penalty methods for bilevel optimization

Zhaosong Lu; Sanyou Mei

First-order penalty methods for bilevel optimization

Zhaosong Lu, Sanyou Mei

TL;DR

This work tackles bilevel optimization where the lower level is possibly nonsmooth and convex while the upper level may be nonconvex. It introduces an $oldsymbol{oldsymbol{ ext{ε-KKT}}}$ solution concept and shows that such a solution implies a near-optimal hypergradient stationary point, under suitable smoothness and convexity assumptions. The authors develop first-order penalty methods that recast BLO as structured minimax problems, which are solvable with a recent first-order minimax solver, and they establish concrete operation complexities: $oldsymbol{O}(oldsymbol{ ext{ε}}^{-4}oldsymbol{ ext{logε}^{-1}})$ for the unconstrained case and $oldsymbol{O}(oldsymbol{ ext{ε}}^{-7}oldsymbol{ ext{logε}^{-1}})$ for the constrained case. Numerical results on linear-quadratic and linear-constrained instances illustrate the practical effectiveness of the approach. Overall, the paper provides the first implementable first-order penalty methods with complexity guarantees that enable approximately solving sophisticated BLO problems via minimax reformulations.

Abstract

In this paper we study a class of unconstrained and constrained bilevel optimization problems in which the lower level is a possibly nonsmooth convex optimization problem, while the upper level is a possibly nonconvex optimization problem. We introduce a notion of $\varepsilon$-KKT solution for them and show that an $\varepsilon$-KKT solution leads to an $O(\sqrt{\varepsilon})$- or $O(\varepsilon)$-hypergradient based stionary point under suitable assumptions. We also propose first-order penalty methods for finding an $\varepsilon$-KKT solution of them, whose subproblems turn out to be a structured minimax problem and can be suitably solved by a first-order method recently developed by the authors. Under suitable assumptions, an \emph{operation complexity} of $O(\varepsilon^{-4}\log\varepsilon^{-1})$ and $O(\varepsilon^{-7}\log\varepsilon^{-1})$, measured by their fundamental operations, is established for the proposed penalty methods for finding an $\varepsilon$-KKT solution of the unconstrained and constrained bilevel optimization problems, respectively. Preliminary numerical results are presented to illustrate the performance of our proposed methods. To the best of our knowledge, this paper is the first work to demonstrate that bilevel optimization can be approximately solved as minimax optimization, and moreover, it provides the first implementable method with complexity guarantees for such sophisticated bilevel optimization.

First-order penalty methods for bilevel optimization

TL;DR

This work tackles bilevel optimization where the lower level is possibly nonsmooth and convex while the upper level may be nonconvex. It introduces an

solution concept and shows that such a solution implies a near-optimal hypergradient stationary point, under suitable smoothness and convexity assumptions. The authors develop first-order penalty methods that recast BLO as structured minimax problems, which are solvable with a recent first-order minimax solver, and they establish concrete operation complexities:

for the unconstrained case and

for the constrained case. Numerical results on linear-quadratic and linear-constrained instances illustrate the practical effectiveness of the approach. Overall, the paper provides the first implementable first-order penalty methods with complexity guarantees that enable approximately solving sophisticated BLO problems via minimax reformulations.

Abstract

-KKT solution for them and show that an

-KKT solution leads to an

- or

-hypergradient based stionary point under suitable assumptions. We also propose first-order penalty methods for finding an

-KKT solution of them, whose subproblems turn out to be a structured minimax problem and can be suitably solved by a first-order method recently developed by the authors. Under suitable assumptions, an \emph{operation complexity} of

and

, measured by their fundamental operations, is established for the proposed penalty methods for finding an

-KKT solution of the unconstrained and constrained bilevel optimization problems, respectively. Preliminary numerical results are presented to illustrate the performance of our proposed methods. To the best of our knowledge, this paper is the first work to demonstrate that bilevel optimization can be approximately solved as minimax optimization, and moreover, it provides the first implementable method with complexity guarantees for such sophisticated bilevel optimization.

Paper Structure (12 sections, 14 theorems, 127 equations, 2 tables, 6 algorithms)

This paper contains 12 sections, 14 theorems, 127 equations, 2 tables, 6 algorithms.

Introduction
Notation and terminology
Unconstrained bilevel optimization
Constrained bilevel optimization
Numerical results
Unconstrained bilevel optimization with linear upper level and quadratic lower level
Constrained bilevel linear optimization
Proof of the main results
Proof of the main results in Section \ref{['unconstr-BLO']}
Proof of the main results in Section \ref{['constr-BLO']}
Concluding remarks
A first-order method for nonconvex-concave minimax problem

Key Result

Theorem 1

Suppose that Assumption a1 holds and that $\{(x^k,y^k,z^k)\}$ is generated by Algorithm alg1. Then any accumulation point of $\{(x^k,y^k)\}$ is an optimal solution of problem unc-prob.

Theorems & Definitions (36)

Definition 1
Definition 2
Theorem 1: Convergence of Algorithm \ref{['alg1']}
Remark 1
Definition 3
Theorem 2
Remark 2
Theorem 3: Complexity of Algorithm \ref{['alg2']}
Remark 3
Theorem 4: Convergence of Algorithm \ref{['alg3']}
...and 26 more

First-order penalty methods for bilevel optimization

TL;DR

Abstract

First-order penalty methods for bilevel optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (36)