Achieving ${O}(ε^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization
Yifan Yang, Peiyao Xiao, Kaiyi Ji
TL;DR
This work tackles stochastic bilevel optimization with a nonconvex upper level and a strongly convex lower level, aiming to achieve $O(\epsilon^{-1.5})$ sample complexity using only first-order information. It introduces FdeHBO, a Hessian/Jacobian-free, fully single-loop optimizer that employs a projection-aided finite-difference scheme to approximate Hessian/Jacobian actions and momentum-based updates for $y$, $v$, and $x$. Theoretical guarantees show $\mathbb{E}\|\nabla \Phi(x)\|^2$ decays at a rate $\tilde O(1/T^{2/3})$ with $\tilde O(\epsilon^{-1.5})$ samples needed to reach an $\\epsilon$-accurate stationary point, representing the first such result without second-order computations. A small-dimension variant, FMBO, preserves the same complexity with a simpler per-iteration Hessian-vector computation. Experiments on MNIST hyper-representation and hyper-cleaning corroborate the theory, demonstrating faster convergence and competitive accuracy against state-of-the-art Hessian/Jacobian-free and fully first-order methods.
Abstract
In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an ${O}(ε^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires ${O}(ε^{-1.5})$ iterations (each using ${O}(1)$ samples and only first-order gradient information) to find an $ε$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an ${O}(ε^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.
