Table of Contents
Fetching ...

Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization

Feihu Huang

TL;DR

We address nonconvex-PL bilevel optimization where the upper-level is potentially nonconvex and the lower-level is nonconvex but satisfies the PL condition. The proposed method, HJFBiO, is Hessian/Jacobian-free and relies on finite-difference estimators and a novel projection-based hypergradient surrogate, achieving a convergence rate of $O\left(\frac{1}{T}\right)$ and a gradient complexity of $O(\varepsilon^{-1})$ for an $\varepsilon$-stationary point. The framework supports both global and local PL lower-level structures, with an $O(p+d)$ per-iteration cost and proven optimality in gradient complexity, and is validated on bilevel PL games and hyper-representation learning tasks. This approach eliminates the need to form Hessians or their inverses, enabling scalable bilevel optimization in practical ML settings.

Abstract

Bilevel optimization is widely applied in many machine learning tasks such as hyper-parameter learning, meta learning and reinforcement learning. Although many algorithms recently have been developed to solve the bilevel optimization problems, they generally rely on the (strongly) convex lower-level problems. More recently, some methods have been proposed to solve the nonconvex-PL bilevel optimization problems, where their upper-level problems are possibly nonconvex, and their lower-level problems are also possibly nonconvex while satisfying Polyak-Łojasiewicz (PL) condition. However, these methods still have a high convergence complexity or a high computation complexity such as requiring compute expensive Hessian/Jacobian matrices and its inverses. In the paper, thus, we propose an efficient Hessian/Jacobian-free method (i.e., HJFBiO) with the optimal convergence complexity to solve the nonconvex-PL bilevel problems. Theoretically, under some mild conditions, we prove that our HJFBiO method obtains an optimal convergence rate of $O(\frac{1}{T})$, where $T$ denotes the number of iterations, and has an optimal gradient complexity of $O(ε^{-1})$ in finding an $ε$-stationary solution. We conduct some numerical experiments on the bilevel PL game and hyper-representation learning task to demonstrate efficiency of our proposed method.

Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization

TL;DR

We address nonconvex-PL bilevel optimization where the upper-level is potentially nonconvex and the lower-level is nonconvex but satisfies the PL condition. The proposed method, HJFBiO, is Hessian/Jacobian-free and relies on finite-difference estimators and a novel projection-based hypergradient surrogate, achieving a convergence rate of and a gradient complexity of for an -stationary point. The framework supports both global and local PL lower-level structures, with an per-iteration cost and proven optimality in gradient complexity, and is validated on bilevel PL games and hyper-representation learning tasks. This approach eliminates the need to form Hessians or their inverses, enabling scalable bilevel optimization in practical ML settings.

Abstract

Bilevel optimization is widely applied in many machine learning tasks such as hyper-parameter learning, meta learning and reinforcement learning. Although many algorithms recently have been developed to solve the bilevel optimization problems, they generally rely on the (strongly) convex lower-level problems. More recently, some methods have been proposed to solve the nonconvex-PL bilevel optimization problems, where their upper-level problems are possibly nonconvex, and their lower-level problems are also possibly nonconvex while satisfying Polyak-Łojasiewicz (PL) condition. However, these methods still have a high convergence complexity or a high computation complexity such as requiring compute expensive Hessian/Jacobian matrices and its inverses. In the paper, thus, we propose an efficient Hessian/Jacobian-free method (i.e., HJFBiO) with the optimal convergence complexity to solve the nonconvex-PL bilevel problems. Theoretically, under some mild conditions, we prove that our HJFBiO method obtains an optimal convergence rate of , where denotes the number of iterations, and has an optimal gradient complexity of in finding an -stationary solution. We conduct some numerical experiments on the bilevel PL game and hyper-representation learning task to demonstrate efficiency of our proposed method.
Paper Structure (16 sections, 17 theorems, 109 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 17 theorems, 109 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.6

(huang2023momentum) Under the above Assumption ass:2, we have, for any $x\in \mathbb{R}^d$,

Figures (3)

  • Figure 1: PL Game: norm of gradient vs number of iteration under $d=100$ (Left) and $d=200$ (Right).
  • Figure 2: Distances of the algorithms under the case of $d=100$ (Left) and $d=200$ (Right).
  • Figure 3: Losses of the algorithms under the case of $d=100$ (Left) and $d=200$ (Right).

Theorems & Definitions (27)

  • Lemma 2.6
  • Lemma 2.7
  • Lemma 2.8
  • Definition 3.1
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Remark 4.5
  • Definition 4.6
  • ...and 17 more