Table of Contents
Fetching ...

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

Alireza Aghasi, Saeed Ghadimi

TL;DR

The paper addresses stochastic bilevel optimization when neither objective nor gradient information is available in closed form. It develops a fully zeroth-order framework by extending Gaussian smoothing via Stein's identity to functions with two independent variable blocks, providing gradient and Hessian estimators based solely on function evaluations. A two-loop algorithm (ZDSBA) combines inner zeroth-order SGD for the lower problem with outer zeroth-order steps for the upper problem, backed by a zeroth-order Hessian-inverse routine (SZHIA); the authors prove non-asymptotic convergence and derive explicit sample complexity bounds for various convexity regimes. This work enables practical zeroth-order solutions for large-scale bilevel problems and offers foundational tools for derivative-free optimization in hierarchical learning and decision-making tasks, while highlighting open questions on dimensionality dependence and potential improvements when first-order information becomes available.

Abstract

In this paper, we study and analyze zeroth-order stochastic approximation algorithms for solving bilvel problems, when neither the upper/lower objective values, nor their unbiased gradient estimates are available. In particular, exploiting Stein's identity, we first use Gaussian smoothing to estimate first- and second-order partial derivatives of functions with two independent block of variables. We then used these estimates in the framework of a stochastic approximation algorithm for solving bilevel optimization problems and establish its non-asymptotic convergence analysis. To the best of our knowledge, this is the first time that sample complexity bounds are established for a fully stochastic zeroth-order bilevel optimization algorithm.

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

TL;DR

The paper addresses stochastic bilevel optimization when neither objective nor gradient information is available in closed form. It develops a fully zeroth-order framework by extending Gaussian smoothing via Stein's identity to functions with two independent variable blocks, providing gradient and Hessian estimators based solely on function evaluations. A two-loop algorithm (ZDSBA) combines inner zeroth-order SGD for the lower problem with outer zeroth-order steps for the upper problem, backed by a zeroth-order Hessian-inverse routine (SZHIA); the authors prove non-asymptotic convergence and derive explicit sample complexity bounds for various convexity regimes. This work enables practical zeroth-order solutions for large-scale bilevel problems and offers foundational tools for derivative-free optimization in hierarchical learning and decision-making tasks, while highlighting open questions on dimensionality dependence and potential improvements when first-order information becomes available.

Abstract

In this paper, we study and analyze zeroth-order stochastic approximation algorithms for solving bilvel problems, when neither the upper/lower objective values, nor their unbiased gradient estimates are available. In particular, exploiting Stein's identity, we first use Gaussian smoothing to estimate first- and second-order partial derivatives of functions with two independent block of variables. We then used these estimates in the framework of a stochastic approximation algorithm for solving bilevel optimization problems and establish its non-asymptotic convergence analysis. To the best of our knowledge, this is the first time that sample complexity bounds are established for a fully stochastic zeroth-order bilevel optimization algorithm.
Paper Structure (32 sections, 19 theorems, 194 equations, 2 algorithms)

This paper contains 32 sections, 19 theorems, 194 equations, 2 algorithms.

Key Result

Theorem 2.1

Let $w\sim\mathcal{N}(0,I_d)$, be a standard Gaussian random vector, and let $q:\mathbb{R}^d\to\mathbb{R}$, be an almost-differentiable function See stein1981estimation for a definition of almost differentiable, with $\mathbb{E}[\|\nabla q\|]<\infty$, then: Furthermore, when the function $q$ has a twice continuously differentiable Hessian, we have

Theorems & Definitions (39)

  • Theorem 2.1
  • Proposition 2.1
  • proof
  • Lemma 2.1
  • Proposition 2.2
  • proof
  • Remark 2.1
  • Remark 2.2
  • Proposition 2.3
  • proof
  • ...and 29 more