Table of Contents
Fetching ...

Effective Bilevel Optimization via Minimax Reformulation

Xiaoyu Wang, Rui Pan, Renjie Pi, Jipeng Zhang

TL;DR

This work proposes a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency, and introduces a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimx problem with convergence guarantees.

Abstract

Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.

Effective Bilevel Optimization via Minimax Reformulation

TL;DR

This work proposes a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency, and introduces a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimx problem with convergence guarantees.

Abstract

Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.
Paper Structure (22 sections, 17 theorems, 149 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 17 theorems, 149 equations, 4 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $\lambda^{\ast}$ denote the solution of the bilevel problem and $u^{\ast} = u(\lambda^{\ast})$ be the corresponding minimizer of the inner problem. We let $(\hat{u}, \hat{\omega}, \hat{\lambda})$ denote the optimal solution of the minimax problem (P:min:max). Suppose that Denote $L_1^{\ast} \triangleq L_1(u^{\ast}, \lambda^{\ast})$, then for any fixed $\alpha > 0$, the following statements h

Figures (4)

  • Figure 1: Hyper-parameter optimization results on a synthesis dataset
  • Figure 2: Hyper-parameter optimization results on 20newsgroups dataset
  • Figure 3: Test accuracy on CIFAR10
  • Figure 4: Data cleaning results on MNIST

Theorems & Definitions (33)

  • Theorem 1
  • Theorem 2: Stronger Equivalence to Bilevel optimization
  • Remark 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Proposition 1
  • Theorem 3
  • ...and 23 more