Table of Contents
Fetching ...

Optimal Zeroth-Order Bilevel Optimization

Alireza Aghasi, Jeongyeol Kwon, Saeed Ghadimi

TL;DR

This work addresses stochastic bilevel optimization under a fully zeroth-order, derivative-free setting by developing two algorithms with nearly optimal and optimal sample complexities. The first algorithm leverages Gaussian smoothing and a nested mini-batch SGD scheme to approximate hypergradients, achieving a dependence of $\mathcal{O}\left( m(m+n)^2 \log(1/\epsilon)/\epsilon^2 \right)$ on problem size, while the second algorithm uses a penalty formulation to obtain a linear dimension dependence, $\mathcal{O}\left( (n+m)/\epsilon^2 \right)$, without requiring Hessian inverses. The analyses rely on Gaussian smoothing via Stein's identities, careful bias/variance control of zeroth-order estimators, and a two-loop scheme that aligns inner and outer updates. Together, these results provide provable, optimal (up to logarithmic factors) sample complexity guarantees for zeroth-order stochastic bilevel optimization and offer scalable, derivative-free tools for meta-learning, AutoML, and hyperparameter search in high-dimensional settings.

Abstract

In this paper, we develop zeroth-order algorithms with provably (nearly) optimal sample complexity for stochastic bilevel optimization, where only noisy function evaluations are available. We propose two distinct algorithms: the first is inspired by Jacobian/Hessian-based approaches, and the second builds on using a penalty function reformulation. The Jacobian/Hessian-based method achieves a sample complexity of $\mathcal{O}(d^3/ε^2)$, which is optimal in terms of accuracy $ε$, albeit with polynomial dependence on the problem dimension $d$. In contrast, the penalty-based method sharpens this guarantee to $\mathcal{O}(d/ε^2)$, optimally reducing the dimension dependence to linear while preserving optimal accuracy scaling. Our analysis is built upon Gaussian smoothing techniques, and we rigorously establish their validity under the stochastic bilevel settings considered in the existing literature. To the best of our knowledge, this is the first work to provide provably optimal sample complexity guarantees for a zeroth-order stochastic approximation method in bilevel optimization.

Optimal Zeroth-Order Bilevel Optimization

TL;DR

This work addresses stochastic bilevel optimization under a fully zeroth-order, derivative-free setting by developing two algorithms with nearly optimal and optimal sample complexities. The first algorithm leverages Gaussian smoothing and a nested mini-batch SGD scheme to approximate hypergradients, achieving a dependence of on problem size, while the second algorithm uses a penalty formulation to obtain a linear dimension dependence, , without requiring Hessian inverses. The analyses rely on Gaussian smoothing via Stein's identities, careful bias/variance control of zeroth-order estimators, and a two-loop scheme that aligns inner and outer updates. Together, these results provide provable, optimal (up to logarithmic factors) sample complexity guarantees for zeroth-order stochastic bilevel optimization and offer scalable, derivative-free tools for meta-learning, AutoML, and hyperparameter search in high-dimensional settings.

Abstract

In this paper, we develop zeroth-order algorithms with provably (nearly) optimal sample complexity for stochastic bilevel optimization, where only noisy function evaluations are available. We propose two distinct algorithms: the first is inspired by Jacobian/Hessian-based approaches, and the second builds on using a penalty function reformulation. The Jacobian/Hessian-based method achieves a sample complexity of , which is optimal in terms of accuracy , albeit with polynomial dependence on the problem dimension . In contrast, the penalty-based method sharpens this guarantee to , optimally reducing the dimension dependence to linear while preserving optimal accuracy scaling. Our analysis is built upon Gaussian smoothing techniques, and we rigorously establish their validity under the stochastic bilevel settings considered in the existing literature. To the best of our knowledge, this is the first work to provide provably optimal sample complexity guarantees for a zeroth-order stochastic approximation method in bilevel optimization.

Paper Structure

This paper contains 29 sections, 24 theorems, 146 equations, 5 figures, 3 algorithms.

Key Result

Theorem 1

Let $u\sim\mathcal{N}(0,I_d)$, be a standard Gaussian random vector, and let $q:\mathbb{R}^d\to\mathbb{R}$, be an almost-differentiable function with $\mathbb{E}[\|\nabla q\|]<\infty$, then $\mathbb{E}[u~\!q(u)] = \mathbb{E}[\nabla q(u)]$. Furthermore, when the function $q$ has a twice continuously

Figures (5)

  • Figure 1: Comparing Algorithm \ref{['alg_ZBSA']} with the ZDSBAF framework in AghaGhad25
  • Figure 2: Comparing Algorithm \ref{['alg_ZBSA']} for different $t_k$ with the ZDSBAF framework in AghaGhad25
  • Figure 3: The performance of Algorithm \ref{['alg_ZBSA']} for different batch sizes
  • Figure 4: Comparing Algorithm \ref{['alg_ZBSA']} with Algorithm \ref{['alg_ZBSA2']} for identical and different $t_k$
  • Figure 5: Comparing Algorithm \ref{['alg_ZBSA']} with Algorithm \ref{['alg_ZBSA2']} as $T$ decreases

Theorems & Definitions (49)

  • Theorem 1
  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • Corollary 1
  • ...and 39 more