Table of Contents
Fetching ...

Achieving ${O}(ε^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

Yifan Yang, Peiyao Xiao, Kaiyi Ji

TL;DR

This work tackles stochastic bilevel optimization with a nonconvex upper level and a strongly convex lower level, aiming to achieve $O(\epsilon^{-1.5})$ sample complexity using only first-order information. It introduces FdeHBO, a Hessian/Jacobian-free, fully single-loop optimizer that employs a projection-aided finite-difference scheme to approximate Hessian/Jacobian actions and momentum-based updates for $y$, $v$, and $x$. Theoretical guarantees show $\mathbb{E}\|\nabla \Phi(x)\|^2$ decays at a rate $\tilde O(1/T^{2/3})$ with $\tilde O(\epsilon^{-1.5})$ samples needed to reach an $\\epsilon$-accurate stationary point, representing the first such result without second-order computations. A small-dimension variant, FMBO, preserves the same complexity with a simpler per-iteration Hessian-vector computation. Experiments on MNIST hyper-representation and hyper-cleaning corroborate the theory, demonstrating faster convergence and competitive accuracy against state-of-the-art Hessian/Jacobian-free and fully first-order methods.

Abstract

In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an ${O}(ε^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires ${O}(ε^{-1.5})$ iterations (each using ${O}(1)$ samples and only first-order gradient information) to find an $ε$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an ${O}(ε^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.

Achieving ${O}(ε^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

TL;DR

This work tackles stochastic bilevel optimization with a nonconvex upper level and a strongly convex lower level, aiming to achieve sample complexity using only first-order information. It introduces FdeHBO, a Hessian/Jacobian-free, fully single-loop optimizer that employs a projection-aided finite-difference scheme to approximate Hessian/Jacobian actions and momentum-based updates for , , and . Theoretical guarantees show decays at a rate with samples needed to reach an -accurate stationary point, representing the first such result without second-order computations. A small-dimension variant, FMBO, preserves the same complexity with a simpler per-iteration Hessian-vector computation. Experiments on MNIST hyper-representation and hyper-cleaning corroborate the theory, demonstrating faster convergence and competitive accuracy against state-of-the-art Hessian/Jacobian-free and fully first-order methods.

Abstract

In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires iterations (each using samples and only first-order gradient information) to find an -accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.
Paper Structure (36 sections, 26 theorems, 157 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 36 sections, 26 theorems, 157 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Under Assumption as:sf, the iterates of the outer problem by alg:main_free satisfy for all $t \in \{0, . . . , T-1\}$ with $L_F^2 = 2(L_{f_x}^2 + L^2_{g_{xy}}r_v^2)$.

Figures (2)

  • Figure 1: Comparison on hyper-representation with the LeNet neural network. Left plot: outer loss v.s. running time; right plot: accuracy v.s. running time.
  • Figure 2: (a) Comparison of different algorithms on data hyper-cleaning with noise $p=0.1$. Left plot: test loss v.s. running time; right plot: train loss v.s. running time. (b) Comparison among different single-loop algorithms: training loss v.s. running time.

Theorems & Definitions (46)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Lemma 1: Boundedness of $v^*$
  • proof
  • ...and 36 more