Table of Contents
Fetching ...

On the Optimal Communication Weights in Distributed Optimization Algorithms

Sebastien Colla, Julien M. Hendrickx

TL;DR

The paper addresses the problem of selecting communication weights in decentralized optimization by leveraging the Performance Estimation Problem (PEP) framework to compute exact worst-case performance for a given algorithm and topology. It shows that minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix is not generally optimal, and it optimizes the edge weights $W$ and the step-size $\alpha$ via PEP to achieve tighter or better performance. Numerical results across common topologies and algorithms (e.g., DIGing, ATC-DIGing, EXTRA, Acc-DNGD) demonstrate that the optimal weights $W^*$ can yield faster convergence and fewer iterations than SLEM-minimizing weights, with the spectrum and signs of the eigenvalues playing a crucial role. The study also compares multiple heuristics, finding no universal winner and suggesting that the full eigen-spectrum informs network performance in decentralized optimization.

Abstract

We establish that in distributed optimization, the prevalent strategy of minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix for selecting communication weights, while optimal for existing theoretical performance bounds, is generally not optimal regarding the exact worst-case performance of the algorithms. This exact performance can be computed using the Performance Estimation Problem (PEP) approach. We thus rely on PEP to formulate an optimization problem that determines the optimal communication weights for a distributed optimization algorithm deployed on a specified undirected graph. Our results show that the optimal weights can outperform the weights minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix. This suggests that the SLEM is not the best characterization of weighted network performance for decentralized optimization. Additionally, we explore and compare alternative heuristics for weight selection in distributed optimization.

On the Optimal Communication Weights in Distributed Optimization Algorithms

TL;DR

The paper addresses the problem of selecting communication weights in decentralized optimization by leveraging the Performance Estimation Problem (PEP) framework to compute exact worst-case performance for a given algorithm and topology. It shows that minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix is not generally optimal, and it optimizes the edge weights and the step-size via PEP to achieve tighter or better performance. Numerical results across common topologies and algorithms (e.g., DIGing, ATC-DIGing, EXTRA, Acc-DNGD) demonstrate that the optimal weights can yield faster convergence and fewer iterations than SLEM-minimizing weights, with the spectrum and signs of the eigenvalues playing a crucial role. The study also compares multiple heuristics, finding no universal winner and suggesting that the full eigen-spectrum informs network performance in decentralized optimization.

Abstract

We establish that in distributed optimization, the prevalent strategy of minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix for selecting communication weights, while optimal for existing theoretical performance bounds, is generally not optimal regarding the exact worst-case performance of the algorithms. This exact performance can be computed using the Performance Estimation Problem (PEP) approach. We thus rely on PEP to formulate an optimization problem that determines the optimal communication weights for a distributed optimization algorithm deployed on a specified undirected graph. Our results show that the optimal weights can outperform the weights minimizing the second-largest eigenvalue modulus (SLEM) of the averaging matrix. This suggests that the SLEM is not the best characterization of weighted network performance for decentralized optimization. Additionally, we explore and compare alternative heuristics for weight selection in distributed optimization.
Paper Structure (8 sections, 27 equations, 1 figure)

This paper contains 8 sections, 27 equations, 1 figure.

Figures (1)

  • Figure 1: These plots show the error criterion $E_{\S}(W,\alpha)$ on the vertical axis, for the optimal averaging matrix $W^*$ in comparison with different averaging matrix heuristics from Section \ref{['sec:heuristics']}. The plots also show the eigenvalue distribution of the matrices (except $\lam_1=1$) on the horizontal axis. Each marker corresponds to a different eigenvalue, with a size proportional to its multiplicity. To obtain a fair comparison between the averaging matrices, we tune the step-size $\alpha$ of the methods for each of them. Each plot corresponds to a different topology or a different algorithm. The local functions are $\mu$-strongly convex and $L$ smooth.

Theorems & Definitions (1)

  • definition 1: Equivalent edges, gross2018graph