Effective Bilevel Optimization via Minimax Reformulation

Xiaoyu Wang; Rui Pan; Renjie Pi; Jipeng Zhang

Effective Bilevel Optimization via Minimax Reformulation

Xiaoyu Wang, Rui Pan, Renjie Pi, Jipeng Zhang

TL;DR

This work proposes a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency, and introduces a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimx problem with convergence guarantees.

Abstract

Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.

Effective Bilevel Optimization via Minimax Reformulation

TL;DR

Abstract

Paper Structure (22 sections, 17 theorems, 149 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 17 theorems, 149 equations, 4 figures, 4 tables, 2 algorithms.

Introduction
Contributions
Proposed Problem and Method
Stochastic Extension of Minimax Formulation
Preliminaries and Theoretical Analysis
One-stage Gradient Descent Ascent Algorithm
Multi-stage Gradient Descent Ascent Algorithm
Numerical Experiments
Hyper-parameter Optimization for Logistic Regression with $\ell_2$ Regularization
Deep Neural Networks with CIFAR10
Data Hyper-Cleaning on MNIST
Conclusion
Proof of Theorem \ref{['thm:equivalent:b:m']}
Exact Bilevel-Minimax Equivalence when $\alpha\to\infty$
An Example of Bilevel-Minimax Equivalence
...and 7 more sections

Key Result

Theorem 1

Let $\lambda^{\ast}$ denote the solution of the bilevel problem and $u^{\ast} = u(\lambda^{\ast})$ be the corresponding minimizer of the inner problem. We let $(\hat{u}, \hat{\omega}, \hat{\lambda})$ denote the optimal solution of the minimax problem (P:min:max). Suppose that Denote $L_1^{\ast} \triangleq L_1(u^{\ast}, \lambda^{\ast})$, then for any fixed $\alpha > 0$, the following statements h

Figures (4)

Figure 1: Hyper-parameter optimization results on a synthesis dataset
Figure 2: Hyper-parameter optimization results on 20newsgroups dataset
Figure 3: Test accuracy on CIFAR10
Figure 4: Data cleaning results on MNIST

Theorems & Definitions (33)

Theorem 1
Theorem 2: Stronger Equivalence to Bilevel optimization
Remark 1
Definition 1
Definition 2
Definition 3
Definition 4
Lemma 1
Proposition 1
Theorem 3
...and 23 more

Effective Bilevel Optimization via Minimax Reformulation

TL;DR

Abstract

Effective Bilevel Optimization via Minimax Reformulation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (33)