Problem-Parameter-Free Decentralized Bilevel Optimization

Zhiwei Zhai; Wenjing Yan; Ying-Jun Angela Zhang

Problem-Parameter-Free Decentralized Bilevel Optimization

Zhiwei Zhai, Wenjing Yan, Ying-Jun Angela Zhang

TL;DR

The paper tackles decentralized bilevel optimization when problem-specific parameters are unavailable. It proposes AdaSDBO, a parameter-free single-loop algorithm that jointly updates primal, dual, and auxiliary variables using hierarchical, gradient-norm-based adaptive stepsizes and a lightweight stepsize-tracking mechanism to maintain consensus. Theoretically, AdaSDBO achieves a finite-time convergence rate of $ ilde{O}(rac{1}{T})$ (up to $ ext{log}^4(T)$ factors) with gradient complexity $ ilde{O}(rac{1}{ ext{epsilon}})$ and is shown to be robust to network topology and initial stepsizes. Empirically, it delivers competitive performance across synthetic and real data, including decentralized hyperparameter optimization and decentralized meta-learning, while demonstrating strong resilience to stepsize choices and network heterogeneity. These results indicate a practical pathway to deploy decentralized bilevel optimization at scale without tedious hyperparameter tuning.

Abstract

Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.

Problem-Parameter-Free Decentralized Bilevel Optimization

TL;DR

Abstract

Problem-Parameter-Free Decentralized Bilevel Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (39)