Problem-Parameter-Free Decentralized Bilevel Optimization
Zhiwei Zhai, Wenjing Yan, Ying-Jun Angela Zhang
TL;DR
The paper tackles decentralized bilevel optimization when problem-specific parameters are unavailable. It proposes AdaSDBO, a parameter-free single-loop algorithm that jointly updates primal, dual, and auxiliary variables using hierarchical, gradient-norm-based adaptive stepsizes and a lightweight stepsize-tracking mechanism to maintain consensus. Theoretically, AdaSDBO achieves a finite-time convergence rate of $ ilde{O}(rac{1}{T})$ (up to $ ext{log}^4(T)$ factors) with gradient complexity $ ilde{O}(rac{1}{ ext{epsilon}})$ and is shown to be robust to network topology and initial stepsizes. Empirically, it delivers competitive performance across synthetic and real data, including decentralized hyperparameter optimization and decentralized meta-learning, while demonstrating strong resilience to stepsize choices and network heterogeneity. These results indicate a practical pathway to deploy decentralized bilevel optimization at scale without tedious hyperparameter tuning.
Abstract
Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.
