Table of Contents
Fetching ...

Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method

Zhao Zhu, Yu-Ping Tian, Xuyang Wu

TL;DR

This work introduces the Decentralized Proximal Bundle Method (DPBM), which leverages historical information (past function values and gradients) within a proximal bundle framework to accelerate decentralized optimization. By embedding bundle minorants into Prox-DGD, and extending the approach to asynchronous and stochastic settings, the authors achieve delay-independent, robust convergence under mild assumptions. They provide a dual-based, efficient subproblem solver and offer convergence guarantees for both deterministic and stochastic variants, including non-smooth and non-quadratic objectives. Numerical experiments on decentralized logistic regression demonstrate faster convergence and greater step-size robustness compared with established baselines. Overall, the paper offers a principled, practical path to exploiting historical data to improve distributed optimization performance.

Abstract

Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.

Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method

TL;DR

This work introduces the Decentralized Proximal Bundle Method (DPBM), which leverages historical information (past function values and gradients) within a proximal bundle framework to accelerate decentralized optimization. By embedding bundle minorants into Prox-DGD, and extending the approach to asynchronous and stochastic settings, the authors achieve delay-independent, robust convergence under mild assumptions. They provide a dual-based, efficient subproblem solver and offer convergence guarantees for both deterministic and stochastic variants, including non-smooth and non-quadratic objectives. Numerical experiments on decentralized logistic regression demonstrate faster convergence and greater step-size robustness compared with established baselines. Overall, the paper offers a principled, practical path to exploiting historical data to improve distributed optimization performance.

Abstract

Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.

Paper Structure

This paper contains 27 sections, 6 theorems, 152 equations, 5 figures, 1 algorithm.

Key Result

Lemma 1

Suppose that Assumption asm:convex holds and each $f_i$ is $\beta_i$-smooth. For the models eq:pol--eq:two_cut, Assumption asm:tilde_f holds.

Figures (5)

  • Figure 1: The cutting-plane model (the black curve represents $f$ and the red curve is the model).
  • Figure 2: Surrogate functions in the deterministic Polyak model \ref{['eq:pol']}, cutting-plane model \ref{['eq:cp']}, and the Polyak cutting-plane model \ref{['eq:pol_cp']}. The black curve represents $f$ and the red curves are the models.
  • Figure 3: Convergence of synchronous optimization methods.
  • Figure 4: Convergence of asynchronous optimization methods.
  • Figure 5: Robustness of DPBM (with different cut number $M$) in the step-size $\gamma$.

Theorems & Definitions (16)

  • Definition 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Theorem 2
  • ...and 6 more