Table of Contents
Fetching ...

qNBO: quasi-Newton Meets Bilevel Optimization

Sheng Fang, Yong-Jin Liu, Wei Yao, Chengming Yu, Jin Zhang

TL;DR

This paper tackles bilevel optimization in hierarchical ML problems by jointly accelerating the lower-level (LL) solve and approximating the inverse Hessian–vector product needed for the hypergradient. It introduces qNBO, a general framework using quasi-Newton recursions, with two concrete instantiations: qNBO (BFGS) and qNBO (SR1), plus a dedicated subroutine to avoid incorrect inversions in the critical hypergradient direction. The authors provide non-asymptotic convergence guarantees for the BFGS variant under standard BLO assumptions and demonstrate favorable gradient and Jacobian-vector complexities, along with competitive empirical performance across hyperparameter optimization, data hyper-cleaning, and meta-learning. The work offers a practical, flexible approach to BLO that leverages superlinear convergence properties of quasi-Newton methods to achieve robust, fast convergence in real-world learning tasks.

Abstract

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational hurdles, notably the computation of the exact lower-level solution and the inverse Hessian of the lower-level objective. Although these two aspects are inherently connected, existing methods typically handle them separately by solving the lower-level problem and a linear system for the inverse Hessian-vector product. In this paper, we introduce a general framework to address these computational challenges in a coordinated manner. Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product. Furthermore, by exploiting the superlinear convergence properties of BFGS, we establish the non-asymptotic convergence analysis of the BFGS adaptation within our framework. Numerical experiments demonstrate the comparable or superior performance of the proposed algorithms in real-world learning tasks, including hyperparameter optimization, data hyper-cleaning, and few-shot meta-learning.

qNBO: quasi-Newton Meets Bilevel Optimization

TL;DR

This paper tackles bilevel optimization in hierarchical ML problems by jointly accelerating the lower-level (LL) solve and approximating the inverse Hessian–vector product needed for the hypergradient. It introduces qNBO, a general framework using quasi-Newton recursions, with two concrete instantiations: qNBO (BFGS) and qNBO (SR1), plus a dedicated subroutine to avoid incorrect inversions in the critical hypergradient direction. The authors provide non-asymptotic convergence guarantees for the BFGS variant under standard BLO assumptions and demonstrate favorable gradient and Jacobian-vector complexities, along with competitive empirical performance across hyperparameter optimization, data hyper-cleaning, and meta-learning. The work offers a practical, flexible approach to BLO that leverages superlinear convergence properties of quasi-Newton methods to achieve robust, fast convergence in real-world learning tasks.

Abstract

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational hurdles, notably the computation of the exact lower-level solution and the inverse Hessian of the lower-level objective. Although these two aspects are inherently connected, existing methods typically handle them separately by solving the lower-level problem and a linear system for the inverse Hessian-vector product. In this paper, we introduce a general framework to address these computational challenges in a coordinated manner. Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product. Furthermore, by exploiting the superlinear convergence properties of BFGS, we establish the non-asymptotic convergence analysis of the BFGS adaptation within our framework. Numerical experiments demonstrate the comparable or superior performance of the proposed algorithms in real-world learning tasks, including hyperparameter optimization, data hyper-cleaning, and few-shot meta-learning.

Paper Structure

This paper contains 72 sections, 24 theorems, 136 equations, 39 figures, 2 tables, 5 algorithms.

Key Result

Theorem 3.4

Suppose that $f$ in (blp) takes the following quadratic form: where $\mu I\preceq A \preceq LI$. Assume that Assumption ass:F and ass:phi hold. Set $Q_k=k+1$ and $H_0=LI$. Let $\kappa:=L/\mu$, $t_b:=4n{\rm ln}\kappa$, $c_t:=2t_b^{\frac{T}{2}}$, and $\omega:=c_1(1+\frac{1}{\varepsilon})c_t^2 \kappa^3(\frac{1}{T})^{T}$, where $c_1$ is a positive constant given i with the initial error $\delta_0=3c_

Figures (39)

  • Figure 3: Data hyper-cleaning results on two datasets. (Left: MNIST; Right: FashionMNIST).
  • Figure :
  • Figure :
  • Figure :
  • Figure :
  • ...and 34 more figures

Theorems & Definitions (44)

  • Theorem 3.4: quadratic case
  • Remark 3.5
  • Remark 3.6
  • Theorem 3.7: general case
  • Lemma D.2
  • Lemma D.3
  • Lemma D.4
  • Lemma D.5
  • Lemma D.6
  • proof
  • ...and 34 more