Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

Alireza Aghasi; Saeed Ghadimi

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

Alireza Aghasi, Saeed Ghadimi

TL;DR

The paper addresses stochastic bilevel optimization when neither objective nor gradient information is available in closed form. It develops a fully zeroth-order framework by extending Gaussian smoothing via Stein's identity to functions with two independent variable blocks, providing gradient and Hessian estimators based solely on function evaluations. A two-loop algorithm (ZDSBA) combines inner zeroth-order SGD for the lower problem with outer zeroth-order steps for the upper problem, backed by a zeroth-order Hessian-inverse routine (SZHIA); the authors prove non-asymptotic convergence and derive explicit sample complexity bounds for various convexity regimes. This work enables practical zeroth-order solutions for large-scale bilevel problems and offers foundational tools for derivative-free optimization in hierarchical learning and decision-making tasks, while highlighting open questions on dimensionality dependence and potential improvements when first-order information becomes available.

Abstract

In this paper, we study and analyze zeroth-order stochastic approximation algorithms for solving bilvel problems, when neither the upper/lower objective values, nor their unbiased gradient estimates are available. In particular, exploiting Stein's identity, we first use Gaussian smoothing to estimate first- and second-order partial derivatives of functions with two independent block of variables. We then used these estimates in the framework of a stochastic approximation algorithm for solving bilevel optimization problems and establish its non-asymptotic convergence analysis. To the best of our knowledge, this is the first time that sample complexity bounds are established for a fully stochastic zeroth-order bilevel optimization algorithm.

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

TL;DR

Abstract

Paper Structure (32 sections, 19 theorems, 194 equations, 2 algorithms)

This paper contains 32 sections, 19 theorems, 194 equations, 2 algorithms.

Introduction
BLP in Machine Learning.
Zeroth-Order Methods in Machine Learning.
Related Works and Our Solution Strategy.
Main Assumptions
Notation and Definitions
Smoothness.
Strong Convexity.
Stochastic Moments.
Gaussian Smoothing for Functions of Two Block-Variables
Stein's Identity and Zeroth-Order Smooth Approximation
Stochastic Gradient of Approximate Smooth Functions
Zeroth-Order Bilevel Formulation
Zeroth-Order Hessian Inverse Operation
Solution Method
...and 17 more sections

Key Result

Theorem 2.1

Let $w\sim\mathcal{N}(0,I_d)$, be a standard Gaussian random vector, and let $q:\mathbb{R}^d\to\mathbb{R}$, be an almost-differentiable function See stein1981estimation for a definition of almost differentiable, with $\mathbb{E}[\|\nabla q\|]<\infty$, then: Furthermore, when the function $q$ has a twice continuously differentiable Hessian, we have

Theorems & Definitions (39)

Theorem 2.1
Proposition 2.1
proof
Lemma 2.1
Proposition 2.2
proof
Remark 2.1
Remark 2.2
Proposition 2.3
proof
...and 29 more

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

TL;DR

Abstract

Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)