Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

Lesi Chen; Junru Li; El Mahdi Chayti; Jingzhao Zhang

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

Lesi Chen, Junru Li, El Mahdi Chayti, Jingzhao Zhang

TL;DR

It is demonstrated that faster rates are achievable for higher-order smooth problems, and the upper bound of F${}^2$SA-$p is nearly optimal in the highly smooth region.

Abstract

This paper studies the complexity of finding an $ε$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(ε^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $Ω(ε^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p ε^{-4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $Ω(ε^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = Ω( \log ε^{-1} / \log \log ε^{-1})$.

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

TL;DR

It is demonstrated that faster rates are achievable for higher-order smooth problems, and the upper bound of F

SA-$p is nearly optimal in the highly smooth region.

Abstract

This paper studies the complexity of finding an

-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F

SA, achieving the

upper complexity bound for first-order smooth problems. This is slower than the optimal

complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F

SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F

SA-

that uses

th-order finite difference for hyper-gradient approximation and improves the upper bound to

for

th-order smooth problems. Finally, we demonstrate that the

lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F

SA-

is nearly optimal in the highly smooth region

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

TL;DR

Abstract

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (29)