Energy-Aware Decentralized Learning with Intermittent Model Training

Akash Dhasade; Paolo Dini; Elia Guerra; Anne-Marie Kermarrec; Marco Miozzo; Rafael Pires; Rishi Sharma; Martijn de Vos

Energy-Aware Decentralized Learning with Intermittent Model Training

Akash Dhasade, Paolo Dini, Elia Guerra, Anne-Marie Kermarrec, Marco Miozzo, Rafael Pires, Rishi Sharma, Martijn de Vos

TL;DR

SkipTrain addresses the energy cost of decentralized learning by interleaving training rounds with coordinated synchronization rounds, achieving substantial energy savings while preserving or improving accuracy on non-IID data. It formalizes an energy model, implements the approach on DecentralizePy, and evaluates on CIFAR-10 and FEMNIST with 256 nodes over multiple topologies using realistic energy traces. The results show about a 50% reduction in training energy and up to 12 percentage points accuracy gains over D-PSGD, with SkipTrain-constrained enabling similar gains under per-node energy budgets. The work highlights practical considerations, including fairness and trace accuracy, and points to future extensions such as asynchronous coordination for scalable energy-aware DL.

Abstract

Decentralized learning (DL) offers a powerful framework where nodes collaboratively train models without sharing raw data and without the coordination of a central server. In the iterative rounds of DL, models are trained locally, shared with neighbors in the topology, and aggregated with other models received from neighbors. Sharing and merging models contribute to convergence towards a consensus model that generalizes better across the collective data captured at training time. In addition, the energy consumption while sharing and merging model parameters is negligible compared to the energy spent during the training phase. Leveraging this fact, we present SkipTrain, a novel DL algorithm, which minimizes energy consumption in decentralized learning by strategically skipping some training rounds and substituting them with synchronization rounds. These training-silent periods, besides saving energy, also allow models to better mix and finally produce models with superior accuracy than typical DL algorithms that train at every round. Our empirical evaluations with 256 nodes demonstrate that SkipTrain reduces energy consumption by 50% and increases model accuracy by up to 12% compared to D-PSGD, the conventional DL algorithm.

Energy-Aware Decentralized Learning with Intermittent Model Training

TL;DR

Abstract

Paper Structure (27 sections, 6 equations, 8 figures, 4 tables)

This paper contains 27 sections, 6 equations, 8 figures, 4 tables.

Introduction
Background
Decentralized learning
D-PSGD algorithm
Energy model
SkipTrain
Coordinated synchronization rounds
Partial client participation
The SkipTrain algorithm
Evaluation
Implementation
Experimental setup
Cluster and network
Datasets and hyperparameters
Metrics
...and 12 more sections

Figures (8)

Figure 1: Comparison between D-PSGD (mean accuracy across nodes) and D-PSGD with all reduce (accuracy of the global average of models) on 256 nodes in a 6-regular topology. All-reduce significantly boosts model performance.
Figure 2: The operations performed by of D-PSGD, SkipTrain and SkipTrain-constrained during multiple rounds, for four nodes.
Figure 3: SkipTrain, Node $i$
Figure 4: The average validation accuracy and energy consumption grid of SkipTrain, for the CIFAR-10 dataset. Darker shades indicate better values.
Figure 5: SkipTrain average test accuracy with CIFAR-10 evaluated every $2$ rounds. The curve shadow indicates the standard deviation.
...and 3 more figures

Energy-Aware Decentralized Learning with Intermittent Model Training

TL;DR

Abstract

Energy-Aware Decentralized Learning with Intermittent Model Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)