Table of Contents
Fetching ...

Decentralized Bilevel Optimization: A Perspective from Transient Iteration Complexity

Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

TL;DR

This work advances decentralized stochastic bilevel optimization by introducing D-SOBA, a single-loop framework with two variants: D-SOBA-SO, which uses second-order Hessian/Jacobian information, and D-SOBA-FO, which relies on first-order gradients via finite differences. It provides a non-asymptotic analysis that yields the first transient-iteration complexity bounds for decentralized SBO, revealing how network topology, data heterogeneity, and the nested bilevel structure jointly influence the non-asymptotic phase. The analysis shows that both variants achieve the same asymptotic rate $O\left(\frac{1}{N\epsilon^2}\right)$ with a two-stage convergence: a transient phase followed by linear speedup, and derives consensus-error bounds to quantify network impact. Experiments on synthetic linear regression, FashionMNIST data cleaning, and decentralized meta-learning corroborate the theoretical findings and demonstrate practical gains, especially with improved connectivity and reduced heterogeneity.

Abstract

Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, most decentralized SBO algorithms focus solely on asymptotic convergence rates, overlooking transient iteration complexity-the number of iterations required before asymptotic rates dominate, which results in limited understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. To address this issue, this paper introduces D-SOBA, a Decentralized Stochastic One-loop Bilevel Algorithm framework. D-SOBA comprises two variants: D-SOBA-SO, which incorporates second-order Hessian and Jacobian matrices, and D-SOBA-FO, which relies entirely on first-order gradients. We provide a comprehensive non-asymptotic convergence analysis and establish the transient iteration complexity of D-SOBA. This provides the first theoretical understanding of how network topology, data heterogeneity, and nested bilevel structures influence decentralized SBO. Extensive experimental results demonstrate the efficiency and theoretical advantages of D-SOBA.

Decentralized Bilevel Optimization: A Perspective from Transient Iteration Complexity

TL;DR

This work advances decentralized stochastic bilevel optimization by introducing D-SOBA, a single-loop framework with two variants: D-SOBA-SO, which uses second-order Hessian/Jacobian information, and D-SOBA-FO, which relies on first-order gradients via finite differences. It provides a non-asymptotic analysis that yields the first transient-iteration complexity bounds for decentralized SBO, revealing how network topology, data heterogeneity, and the nested bilevel structure jointly influence the non-asymptotic phase. The analysis shows that both variants achieve the same asymptotic rate with a two-stage convergence: a transient phase followed by linear speedup, and derives consensus-error bounds to quantify network impact. Experiments on synthetic linear regression, FashionMNIST data cleaning, and decentralized meta-learning corroborate the theoretical findings and demonstrate practical gains, especially with improved connectivity and reduced heterogeneity.

Abstract

Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, most decentralized SBO algorithms focus solely on asymptotic convergence rates, overlooking transient iteration complexity-the number of iterations required before asymptotic rates dominate, which results in limited understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. To address this issue, this paper introduces D-SOBA, a Decentralized Stochastic One-loop Bilevel Algorithm framework. D-SOBA comprises two variants: D-SOBA-SO, which incorporates second-order Hessian and Jacobian matrices, and D-SOBA-FO, which relies entirely on first-order gradients. We provide a comprehensive non-asymptotic convergence analysis and establish the transient iteration complexity of D-SOBA. This provides the first theoretical understanding of how network topology, data heterogeneity, and nested bilevel structures influence decentralized SBO. Extensive experimental results demonstrate the efficiency and theoretical advantages of D-SOBA.
Paper Structure (27 sections, 19 theorems, 151 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 19 theorems, 151 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Suppose Assumptions assumption:smooth and assumption:unbiased hold. Then, the terms $p_{H,i}^{(t)}$ and $p_{J,i}^{(t)}$ obtained from D-SOBA-FO satisfy (see proof in Lemma desjifen): where $\iota^2=\dfrac{1}{3}L^2_{\nabla^2g}\delta_t^2\left\Vert z_i^{(t)}\right\Vert^4$.

Figures (8)

  • Figure 1: Decentralized algorithm (with stage-wise decaying learning rate) has to experience sufficiently massive transient iterations to achieve the same asymptotic rate as centralized approach.
  • Figure 2: Convergence performance of D-SOBA-SO over various networks under different data heterogeneity.
  • Figure 3: The upper-level loss (left) and test accuracy (right) of different decentralized stochastic bilevel optimization algorithms.
  • Figure 4: The upper-level loss (left) and test accuracy (right) of D-SOBA with different communication topologies.
  • Figure 5: The test accuracy of D-SOBA-SO with different moving-average parameter $\theta_t$.
  • ...and 3 more figures

Theorems & Definitions (22)

  • Remark 1: spectral gap
  • Remark 2: Non-symmetric mixing matrix
  • Proposition 1
  • Theorem 1
  • Corollary 2: transient iteration complexity
  • Corollary 3: concensus error
  • Corollary 4: deterministic convergence
  • Lemma 1
  • Lemma 2
  • Remark 3
  • ...and 12 more