SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization
Shuchen Zhu, Boao Kong, Songtao Lu, Xinmeng Huang, Kun Yuan
TL;DR
SPARKLE introduces a unified, single-loop primal-dual framework for decentralized bilevel optimization, addressing data heterogeneity by integrating diverse correction strategies (ED, EXTRA, GT) and allowing different update schemes across upper-, lower-, and auxiliary-level problems. The authors provide a unified convergence analysis with state-of-the-art rates, demonstrate linear speedup, and show that mixing heterogeneity-correction schemes at different levels yields tangible improvements over GT alone. Through extensive experiments on hyper-cleaning, distributed reinforcement learning, and decentralized meta-learning, SPARKLE achieves robust performance and often outperforms existing decentralized SBO methods. The framework’s flexibility in topology and level-specific updates offers practical benefits for large-scale, distributed learning systems, albeit with current limitations to strongly convex lower-level problems and sensitivity to problem conditioning.
Abstract
This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications. Most existing literature primarily utilizes gradient tracking to mitigate the influence of data heterogeneity, without exploring other well-known heterogeneity-correction techniques such as EXTRA or Exact Diffusion. Additionally, these studies often employ identical decentralized strategies for both upper- and lower-level problems, neglecting to leverage distinct mechanisms across different levels. To address these limitations, this paper proposes SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization. SPARKLE offers the flexibility to incorporate various heterogeneitycorrection strategies into the algorithm. Moreover, SPARKLE allows for different strategies to solve upper- and lower-level problems. We present a unified convergence analysis for SPARKLE, applicable to all its variants, with state-of-the-art convergence rates compared to existing decentralized bilevel algorithms. Our results further reveal that EXTRA and Exact Diffusion are more suitable for decentralized bilevel optimization, and using mixed strategies in bilevel algorithms brings more benefits than relying solely on gradient tracking.
