Energy-efficient Decentralized Learning via Graph Sparsification

Xusheng Zhang; Cho-Chun Chiu; Ting He

Energy-efficient Decentralized Learning via Graph Sparsification

Xusheng Zhang, Cho-Chun Chiu, Ting He

TL;DR

This work tackles energy efficiency in decentralized learning by jointly optimizing the mixing matrix and communication topology. It formulates a bi-level problem where the lower-level uses graph sparsification to design link activations that accelerate convergence, and the upper-level selects per-node budgets to minimize total workload. The authors provide a guaranteed-performance Ramanujan-graph design for a fully connected base topology and a scalable greedy heuristic for general graphs, achieving substantial busiest-node energy savings (54%–76%) without sacrificing model quality on realistic datasets and topologies. These results demonstrate a practical path to greener distributed learning systems by balancing communication load through principled spectral-design of the mixing matrix.

Abstract

This work aims at improving the energy efficiency of decentralized learning by optimizing the mixing matrix, which controls the communication demands during the learning process. Through rigorous analysis based on a state-of-the-art decentralized learning algorithm, the problem is formulated as a bi-level optimization, with the lower level solved by graph sparsification. A solution with guaranteed performance is proposed for the special case of fully-connected base topology and a greedy heuristic is proposed for the general case. Simulations based on real topology and dataset show that the proposed solution can lower the energy consumption at the busiest node by 54%-76% while maintaining the quality of the trained model.

Energy-efficient Decentralized Learning via Graph Sparsification

TL;DR

Abstract

Paper Structure (22 sections, 5 theorems, 20 equations, 2 figures, 1 table)

This paper contains 22 sections, 5 theorems, 20 equations, 2 figures, 1 table.

Introduction
Related Work
Summary of Contributions
Background and Problem Formulation
Decentralized Learning Algorithm
Mixing Matrix
Cost Model
Optimization Framework
Mixing Matrix Design via Graph Sparsification
Simplified Objective
Idea on Leveraging Graph Sparsification
Algorithm Design
Ramanujan-Graph-based Design for a Special Case
Intractability for General Case
Greedy Heuristic for General Case
...and 7 more sections

Key Result

Theorem 2.1

Let $\bm{J}:={1\over m}\bm{1} \bm{1}^\top$ and let $\bm{W}$ be a random symmetric matrix such that each row/column in $\bm{W}$ sums to one. Let $\rho:= \|{\rm I E}[\bm{W}^\top\bm{W}]-\bm{J}\|$. Under assumptions (1)--(3), if each mixing matrix $\bm{W}^{(k)}$ is an i.i.d. copy of $\bm{W}$ and $\rho <

Figures (2)

Figure 1: Training loss and testing accuracy for decentralized learning over Roofnet.
Figure 2: Training loss and testing accuracy for decentralized learning over a complete graph.

Theorems & Definitions (8)

Theorem 2.1
Lemma 3.1
Theorem 3.2
Theorem A.1
Lemma A.2
proof : Proof of Lemma \ref{['lem:p=1-rho']}
proof : Proof of Lemma \ref{['lem:rho upper-bound']}
proof : Proof of Theorem \ref{['thm:nphard']}

Energy-efficient Decentralized Learning via Graph Sparsification

TL;DR

Abstract

Energy-efficient Decentralized Learning via Graph Sparsification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (8)