Table of Contents
Fetching ...

A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL

Venkataraman Natarajan Iyer

TL;DR

This paper surveys the challenges posed by heterogeneous and non-IID data in Federated Learning and reviews three representative mitigation strategies. It introduces FedDF, a knowledge-distillation-based ensemble fusion approach; FedLbl, a label-aware robust aggregation method; and Def-KT, a decentralized mutual knowledge transfer framework, detailing their mechanisms, datasets, and key empirical findings. Across image and NLP tasks, these methods improve global accuracy, accelerate convergence, and stabilize training under non-IID conditions, illustrating the practical potential of combining distillation, weighted aggregation, and decentralized collaboration. The work highlights ongoing directions such as adaptive learning, personalization, fairness, and domain knowledge incorporation as essential for robust, privacy-preserving learning in heterogeneous FL environments.

Abstract

Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple decentralized edge devices that hold local data samples, all without exchanging these samples. This collaborative process occurs under the supervision of a central server orchestrating the training or via a peer-to-peer network. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. However, training a model under the Federated learning setting brings forth several challenges, with one of the most prominent being the heterogeneity of data distribution among the edge devices. The data is typically non-independently and non-identically distributed (non-IID), thereby presenting challenges to model convergence. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.

A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL

TL;DR

This paper surveys the challenges posed by heterogeneous and non-IID data in Federated Learning and reviews three representative mitigation strategies. It introduces FedDF, a knowledge-distillation-based ensemble fusion approach; FedLbl, a label-aware robust aggregation method; and Def-KT, a decentralized mutual knowledge transfer framework, detailing their mechanisms, datasets, and key empirical findings. Across image and NLP tasks, these methods improve global accuracy, accelerate convergence, and stabilize training under non-IID conditions, illustrating the practical potential of combining distillation, weighted aggregation, and decentralized collaboration. The work highlights ongoing directions such as adaptive learning, personalization, fairness, and domain knowledge incorporation as essential for robust, privacy-preserving learning in heterogeneous FL environments.

Abstract

Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple decentralized edge devices that hold local data samples, all without exchanging these samples. This collaborative process occurs under the supervision of a central server orchestrating the training or via a peer-to-peer network. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. However, training a model under the Federated learning setting brings forth several challenges, with one of the most prominent being the heterogeneity of data distribution among the edge devices. The data is typically non-independently and non-identically distributed (non-IID), thereby presenting challenges to model convergence. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.
Paper Structure (7 sections, 5 figures)

This paper contains 7 sections, 5 figures.

Figures (5)

  • Figure 1: Federated learning
  • Figure 2: Heterogeneous data
  • Figure 3: IID vs non-IID data
  • Figure 4: Label distribution skew amongst participants
  • Figure 5: Quantity distribution skew amongst participants