On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

Yubo Zhou; Luo Luo; Guang Dai; Haishan Ye

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

Yubo Zhou, Luo Luo, Guang Dai, Haishan Ye

TL;DR

It is proved that SSAID achieves an $\epsilon$-stationary point with an oracle complexity of $\mathcal{O}(\kappa^7 \epsilon^{-2})$ and the first explicit, fine-grained characterization of the $\kappa$-dependence for stochastic AID-based single-loop methods.

Abstract

Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains significantly underdeveloped compared to their multi-loop counterparts. Existing analyses often yield suboptimal convergence rates or obscure the critical dependence on the lower-level condition number $κ$, frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an $ε$-stationary point with an oracle complexity of $\mathcal{O}(κ^7 ε^{-2})$. Our result is noteworthy in two aspects: (i) it matches the optimal $\mathcal{O}(ε^{-2})$ rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the $κ$-dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

TL;DR

It is proved that SSAID achieves an

-stationary point with an oracle complexity of

and the first explicit, fine-grained characterization of the

-dependence for stochastic AID-based single-loop methods.

Abstract

, frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an

-stationary point with an oracle complexity of

. Our result is noteworthy in two aspects: (i) it matches the optimal

rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the

-dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.

Paper Structure (28 sections, 15 theorems, 94 equations, 1 algorithm)

This paper contains 28 sections, 15 theorems, 94 equations, 1 algorithm.

Introduction
Contributions.
Related Work
Comparison with Our Result.
Background and Algorithm
Analysis
Definitions and Assumptions
Useful Lemmas
Convergence Analysis
Bounding Lower-level Error
Bounding Linear System Error
Controlling Hypergradient Estimation Quality
Convergence Rate to Stationarity
Conclusion
Proof of Lemma \ref{['lem:seq']}
...and 13 more sections

Key Result

Lemma 1

Suppose Assumptions assum-2-assum-3 hold. Then, the stochastic derivatives $\nabla F(z;\xi)$, $\nabla G(z;\zeta)$, $\nabla_{xy}^2 G(z;\zeta)$ and $\nabla_y^2 G(z;\zeta)$ have bounded variances, i.e., for any $z$, $\xi$ and $\zeta$,

Theorems & Definitions (37)

Definition 1
Definition 2: Oracle Complexity
Definition 3: Filtration and Conditional Expectation
Lemma 1
Lemma 2
Lemma 3
Lemma 4: Tracking Error of Lower-Level Variables
Remark 1: Stability of Lower-Level Tracking
Lemma 5: Boundedness of $\hat{v}_k$
Remark 2
...and 27 more

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

TL;DR

Abstract

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (37)