Table of Contents
Fetching ...

A Single-Loop Algorithm for Decentralized Bilevel Optimization

Youran Dong, Shiqian Ma, Junfeng Yang, Chao Yin

TL;DR

This work tackles decentralized bilevel optimization with a strongly convex lower level by introducing SLDBO, a fully single-loop algorithm that uses only two matrix-vector multiplications per iteration. It combines gradient tracking with a projection step to remove the need for gradient-heterogeneity assumptions, and provides an $O(1/K)$ convergence rate for stationarity along with consensus guarantees. The authors validate the approach on synthetic and MNIST-based hyperparameter problems, showing faster convergence and reduced communication overhead compared to baselines. The method enhances scalability of distributed bilevel optimization and broadens its applicability in privacy-preserving, networked settings.

Abstract

Bilevel optimization has gained significant attention in recent years due to its broad applications in machine learning. This paper focuses on bilevel optimization in decentralized networks and proposes a novel single-loop algorithm for solving decentralized bilevel optimization with a strongly convex lower-level problem. Our approach is a fully single-loop method that approximates the hypergradient using only two matrix-vector multiplications per iteration. Importantly, our algorithm does not require any gradient heterogeneity assumption, distinguishing it from existing methods for decentralized bilevel optimization and federated bilevel optimization. Our analysis demonstrates that the proposed algorithm achieves the best-known convergence rate for bilevel optimization algorithms. We also present experimental results on hyperparameter optimization problems using both synthetic and MNIST datasets, which demonstrate the efficiency of our proposed algorithm.

A Single-Loop Algorithm for Decentralized Bilevel Optimization

TL;DR

This work tackles decentralized bilevel optimization with a strongly convex lower level by introducing SLDBO, a fully single-loop algorithm that uses only two matrix-vector multiplications per iteration. It combines gradient tracking with a projection step to remove the need for gradient-heterogeneity assumptions, and provides an convergence rate for stationarity along with consensus guarantees. The authors validate the approach on synthetic and MNIST-based hyperparameter problems, showing faster convergence and reduced communication overhead compared to baselines. The method enhances scalability of distributed bilevel optimization and broadens its applicability in privacy-preserving, networked settings.

Abstract

Bilevel optimization has gained significant attention in recent years due to its broad applications in machine learning. This paper focuses on bilevel optimization in decentralized networks and proposes a novel single-loop algorithm for solving decentralized bilevel optimization with a strongly convex lower-level problem. Our approach is a fully single-loop method that approximates the hypergradient using only two matrix-vector multiplications per iteration. Importantly, our algorithm does not require any gradient heterogeneity assumption, distinguishing it from existing methods for decentralized bilevel optimization and federated bilevel optimization. Our analysis demonstrates that the proposed algorithm achieves the best-known convergence rate for bilevel optimization algorithms. We also present experimental results on hyperparameter optimization problems using both synthetic and MNIST datasets, which demonstrate the efficiency of our proposed algorithm.
Paper Structure (15 sections, 12 theorems, 86 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 12 theorems, 86 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3.1

For any integer $K\geq 1$, when $0\leq k\leq K$, define $\bar{x}^k = \frac{1}{n}\sum_{i=1}^{n}x^k_i$, $\bar{y}^k = \frac{1}{n}\sum_{i=1}^{n}y^k_i$ and $\bar{v}^k = \frac{1}{n}\sum_{i=1}^{n}v^k_i$. The following convergence rate results hold for Algorithm alg:slDB.

Figures (4)

  • Figure 1: Comparison between MA-DSBO and SLDBO on synthetic data ($p=50$).
  • Figure 2: Comparison between MA-DSBO and SLDBO on synthetic data ($p=200$).
  • Figure 3: Comparison between MA-DSBO, SLDBO (w/o proj.) and SLDBO on synthetic data. Dimension: $p=50$. Heterogeneity rate: $r=1$ (left), $r=40$ (right).
  • Figure 4: Comparison of test loss, train loss, and classification accuracy between MA-DSBO and SLDBO on real-world MNIST dataset.

Theorems & Definitions (27)

  • Remark 2.1
  • Theorem 3.1
  • Lemma A.1
  • proof
  • Lemma A.2
  • Lemma A.3
  • proof
  • Remark A.1
  • Lemma A.4
  • proof
  • ...and 17 more