Table of Contents
Fetching ...

Problem-Parameter-Free Decentralized Bilevel Optimization

Zhiwei Zhai, Wenjing Yan, Ying-Jun Angela Zhang

TL;DR

The paper tackles decentralized bilevel optimization when problem-specific parameters are unavailable. It proposes AdaSDBO, a parameter-free single-loop algorithm that jointly updates primal, dual, and auxiliary variables using hierarchical, gradient-norm-based adaptive stepsizes and a lightweight stepsize-tracking mechanism to maintain consensus. Theoretically, AdaSDBO achieves a finite-time convergence rate of $ ilde{O}( rac{1}{T})$ (up to $ ext{log}^4(T)$ factors) with gradient complexity $ ilde{O}( rac{1}{ ext{epsilon}})$ and is shown to be robust to network topology and initial stepsizes. Empirically, it delivers competitive performance across synthetic and real data, including decentralized hyperparameter optimization and decentralized meta-learning, while demonstrating strong resilience to stepsize choices and network heterogeneity. These results indicate a practical pathway to deploy decentralized bilevel optimization at scale without tedious hyperparameter tuning.

Abstract

Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.

Problem-Parameter-Free Decentralized Bilevel Optimization

TL;DR

The paper tackles decentralized bilevel optimization when problem-specific parameters are unavailable. It proposes AdaSDBO, a parameter-free single-loop algorithm that jointly updates primal, dual, and auxiliary variables using hierarchical, gradient-norm-based adaptive stepsizes and a lightweight stepsize-tracking mechanism to maintain consensus. Theoretically, AdaSDBO achieves a finite-time convergence rate of (up to factors) with gradient complexity and is shown to be robust to network topology and initial stepsizes. Empirically, it delivers competitive performance across synthetic and real data, including decentralized hyperparameter optimization and decentralized meta-learning, while demonstrating strong resilience to stepsize choices and network heterogeneity. These results indicate a practical pathway to deploy decentralized bilevel optimization at scale without tedious hyperparameter tuning.

Abstract

Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of , matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.

Paper Structure

This paper contains 39 sections, 21 theorems, 162 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Under Assumptions Assumption1 and Assumption2, for any integer $k_0 \in [0, t)$, we have $\sum_{k=k_0}^t \frac{\|\nabla_y l(\bar{x}_k, \bar{y}_k)\|^2}{\bar{m}^y_{k+1}} \leq a_5 \log(t + 1) + b_5$ and $\sum_{k=k_0}^t \frac{\|\nabla_v r(\bar{x}_k, \bar{y}_k, \bar{v}_k)\|^2}{\bar{z}_{k+1}} \leq a_6 \lo

Figures (10)

  • Figure 1: Test Accuracy on different datasets.
  • Figure 2: Test accuracy versus stepsize on different datasets.
  • Figure 3: Structure of the proof
  • Figure 4: Test accuracy on synthetic dataset with $r=5$.
  • Figure 5: Test accuracy and upper-level loss for synthetic dataset with different $\rho_W$.
  • ...and 5 more figures

Theorems & Definitions (39)

  • Remark 1
  • Definition 1
  • Definition 1
  • Lemma 1: Approximation Errors
  • Lemma 2: Accumulated Gradients
  • Lemma 3: Consensus Errors
  • Lemma 4: Stepsize Inconsistencies
  • Theorem 3.1
  • Remark 2
  • Corollary 1
  • ...and 29 more