Problem-Parameter-Free Decentralized Nonconvex Stochastic Optimization
Jiaxiang Li, Xuxing Chen, Shiqian Ma, Mingyi Hong
TL;DR
This paper addresses decentralized nonconvex stochastic optimization without relying on problem parameters like Lipschitz constants or network spectrum. It introduces D-NASA, a parameter-free algorithm that uses normalized gradient directions and moving-average tracking to control consensus error and enable convergence without prior problem information. Theoretical results show that D-NASA achieves optimal nonconvex stochastic convergence rates and linear speedup in the number of nodes, matching lower bounds under standard assumptions. Empirical evaluations on synthetic and real datasets demonstrate robust performance and superior generalization without hyperparameter tuning across diverse network topologies, underscoring practical impact for large-scale distributed learning. Overall, the work closes a key gap between theory and practice in decentralized optimization by delivering a parameter-free, scalable, and provably efficient algorithm.
Abstract
Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm for decentralized nonconvex stochastic optimization that requires no prior knowledge of any problem parameters. We show that D-NASA has the optimal rate of convergence for nonconvex objectives under very mild conditions and enjoys the linear-speedup effect, i.e. the computation becomes faster as the number of nodes in the system increases. Extensive numerical experiments are conducted to support our findings.
