Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate
Hongchang Gao
TL;DR
This work tackles decentralized stochastic multi-level compositional optimization, where nested functions across distributed devices define the objective $F(x)=\frac{1}{N}\sum_{n=1}^{N}F_n(x)$. It introduces two algorithms, DSMCGDM and DSMCVRG, that achieve level-independent convergence in nonconvex settings by combining momentum, gradient tracking, and STORM-like variance reduction for inner levels (with a practical alternative for outer gradients in the second method). Theoretical results show rate guarantees: $O((1-\lambda)^{-2}\epsilon^{-4})$ for the momentum-based method and $O((1-\lambda)^{-2}\epsilon^{-3})$ for the variance-reduced variant, with sample and communication costs scaling as $O((1-\lambda)^{-2}\epsilon^{-4})$ under unit mini-batch sizes. Empirical results on multi-step model-agnostic meta-learning tasks corroborate the advantages of the proposed decentralized approaches over standard DSGD, including faster convergence and better scalability across graph topologies and additional levels.
Abstract
Stochastic multi-level compositional optimization problems cover many new machine learning paradigms, e.g., multi-step model-agnostic meta-learning, which require efficient optimization algorithms for large-scale data. This paper studies the decentralized stochastic multi-level optimization algorithm, which is challenging because the multi-level structure and decentralized communication scheme may make the number of levels significantly affect the order of the convergence rate. To this end, we develop two novel decentralized optimization algorithms to optimize the multi-level compositional optimization problem. Our theoretical results show that both algorithms can achieve the level-independent convergence rate for nonconvex problems under much milder conditions compared with existing single-machine algorithms. To the best of our knowledge, this is the first work that achieves the level-independent convergence rate under the decentralized setting. Moreover, extensive experiments confirm the efficacy of our proposed algorithms.
