Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method
Zhao Zhu, Yu-Ping Tian, Xuyang Wu
TL;DR
This work introduces the Decentralized Proximal Bundle Method (DPBM), which leverages historical information (past function values and gradients) within a proximal bundle framework to accelerate decentralized optimization. By embedding bundle minorants into Prox-DGD, and extending the approach to asynchronous and stochastic settings, the authors achieve delay-independent, robust convergence under mild assumptions. They provide a dual-based, efficient subproblem solver and offer convergence guarantees for both deterministic and stochastic variants, including non-smooth and non-quadratic objectives. Numerical experiments on decentralized logistic regression demonstrate faster convergence and greater step-size robustness compared with established baselines. Overall, the paper offers a principled, practical path to exploiting historical data to improve distributed optimization performance.
Abstract
Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.
