Table of Contents
Fetching ...

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

Yubo Zhou, Luo Luo, Guang Dai, Haishan Ye

TL;DR

It is proved that SSAID achieves an $\epsilon$-stationary point with an oracle complexity of $\mathcal{O}(\kappa^7 \epsilon^{-2})$ and the first explicit, fine-grained characterization of the $\kappa$-dependence for stochastic AID-based single-loop methods.

Abstract

Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains significantly underdeveloped compared to their multi-loop counterparts. Existing analyses often yield suboptimal convergence rates or obscure the critical dependence on the lower-level condition number $κ$, frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an $ε$-stationary point with an oracle complexity of $\mathcal{O}(κ^7 ε^{-2})$. Our result is noteworthy in two aspects: (i) it matches the optimal $\mathcal{O}(ε^{-2})$ rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the $κ$-dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

TL;DR

It is proved that SSAID achieves an -stationary point with an oracle complexity of and the first explicit, fine-grained characterization of the -dependence for stochastic AID-based single-loop methods.

Abstract

Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains significantly underdeveloped compared to their multi-loop counterparts. Existing analyses often yield suboptimal convergence rates or obscure the critical dependence on the lower-level condition number , frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an -stationary point with an oracle complexity of . Our result is noteworthy in two aspects: (i) it matches the optimal rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the -dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.
Paper Structure (28 sections, 15 theorems, 94 equations, 1 algorithm)

This paper contains 28 sections, 15 theorems, 94 equations, 1 algorithm.

Key Result

Lemma 1

Suppose Assumptions assum-2-assum-3 hold. Then, the stochastic derivatives $\nabla F(z;\xi)$, $\nabla G(z;\zeta)$, $\nabla_{xy}^2 G(z;\zeta)$ and $\nabla_y^2 G(z;\zeta)$ have bounded variances, i.e., for any $z$, $\xi$ and $\zeta$,

Theorems & Definitions (37)

  • Definition 1
  • Definition 2: Oracle Complexity
  • Definition 3: Filtration and Conditional Expectation
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4: Tracking Error of Lower-Level Variables
  • Remark 1: Stability of Lower-Level Tracking
  • Lemma 5: Boundedness of $\hat{v}_k$
  • Remark 2
  • ...and 27 more