Optimal Zeroth-Order Bilevel Optimization
Alireza Aghasi, Jeongyeol Kwon, Saeed Ghadimi
TL;DR
This work addresses stochastic bilevel optimization under a fully zeroth-order, derivative-free setting by developing two algorithms with nearly optimal and optimal sample complexities. The first algorithm leverages Gaussian smoothing and a nested mini-batch SGD scheme to approximate hypergradients, achieving a dependence of $\mathcal{O}\left( m(m+n)^2 \log(1/\epsilon)/\epsilon^2 \right)$ on problem size, while the second algorithm uses a penalty formulation to obtain a linear dimension dependence, $\mathcal{O}\left( (n+m)/\epsilon^2 \right)$, without requiring Hessian inverses. The analyses rely on Gaussian smoothing via Stein's identities, careful bias/variance control of zeroth-order estimators, and a two-loop scheme that aligns inner and outer updates. Together, these results provide provable, optimal (up to logarithmic factors) sample complexity guarantees for zeroth-order stochastic bilevel optimization and offer scalable, derivative-free tools for meta-learning, AutoML, and hyperparameter search in high-dimensional settings.
Abstract
In this paper, we develop zeroth-order algorithms with provably (nearly) optimal sample complexity for stochastic bilevel optimization, where only noisy function evaluations are available. We propose two distinct algorithms: the first is inspired by Jacobian/Hessian-based approaches, and the second builds on using a penalty function reformulation. The Jacobian/Hessian-based method achieves a sample complexity of $\mathcal{O}(d^3/ε^2)$, which is optimal in terms of accuracy $ε$, albeit with polynomial dependence on the problem dimension $d$. In contrast, the penalty-based method sharpens this guarantee to $\mathcal{O}(d/ε^2)$, optimally reducing the dimension dependence to linear while preserving optimal accuracy scaling. Our analysis is built upon Gaussian smoothing techniques, and we rigorously establish their validity under the stochastic bilevel settings considered in the existing literature. To the best of our knowledge, this is the first work to provide provably optimal sample complexity guarantees for a zeroth-order stochastic approximation method in bilevel optimization.
