Table of Contents
Fetching ...

Averaging Rate Scheduler for Decentralized Learning on Heterogeneous Data

Sai Aparna Aketi, Sakshi Choudhary, Kaushik Roy

TL;DR

These experiments illustrate the superiority of the proposed averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning.

Abstract

State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.

Averaging Rate Scheduler for Decentralized Learning on Heterogeneous Data

TL;DR

These experiments illustrate the superiority of the proposed averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning.

Abstract

State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.
Paper Structure (13 sections, 2 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 2 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Variation of test accuracy with averaging rate (constant during training) during training of CIFAR-10 on ResNet-20 over a ring graph of 16 agents with $\alpha=0.01$.
  • Figure 2: Average consensus error during training for various datasets trained over ring topology of 16 agents. The graph is plotted for one seed where the solid line represents the average consensus error across the agents and the shaded region represents the variation of the consensus error across agents.
  • Figure 3: Average validation during training for various datasets trained over ring topology of 16 agents. The graph is plotted for one seed where the solid line represents the average loss across the agents and the shaded region represents the variation of the loss across agents.