Table of Contents
Fetching ...

A Survey on Heterogeneous Federated Learning

Dashan Gao, Xin Yao, Qiang Yang

TL;DR

The paper surveys heterogeneous federated learning across data-space, statistical, and system dimensions, and introduces a taxonomy that distinguishes data-space homogeneous vs heterogeneous FL and transfers (VFL, Hetero-FTL, Homo-FTL). It consolidates methods to address heterogeneity via transfer learning (representation learning, distillation, augmentation, collaborative filtering), analyzes privacy and security concerns (MPC, HE, DP, TEE), and reviews applications in recommender systems, finance, and healthcare. The authors highlight that data-space heterogeneity, especially Hetero-FTL, remains under-explored and propose future directions focusing on framework design, adaptability to partial alignment, efficiency, and trustworthy, privacy-preserving solutions. Overall, the work provides a comprehensive roadmap for researchers and practitioners aiming to build robust, privacy-preserving heterogeneous FL systems across industries.

Abstract

Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.

A Survey on Heterogeneous Federated Learning

TL;DR

The paper surveys heterogeneous federated learning across data-space, statistical, and system dimensions, and introduces a taxonomy that distinguishes data-space homogeneous vs heterogeneous FL and transfers (VFL, Hetero-FTL, Homo-FTL). It consolidates methods to address heterogeneity via transfer learning (representation learning, distillation, augmentation, collaborative filtering), analyzes privacy and security concerns (MPC, HE, DP, TEE), and reviews applications in recommender systems, finance, and healthcare. The authors highlight that data-space heterogeneity, especially Hetero-FTL, remains under-explored and propose future directions focusing on framework design, adaptability to partial alignment, efficiency, and trustworthy, privacy-preserving solutions. Overall, the work provides a comprehensive roadmap for researchers and practitioners aiming to build robust, privacy-preserving heterogeneous FL systems across industries.

Abstract

Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.
Paper Structure (70 sections, 2 theorems, 21 equations, 10 figures, 8 tables)

This paper contains 70 sections, 2 theorems, 21 equations, 10 figures, 8 tables.

Key Result

Theorem A.5

(Laplace mechanism). Given a function $\mathcal{M}: \mathcal{D} \rightarrow \mathcal{R}^d$ over an arbitrary domain $D$, for any input $X$, the function: provides $\epsilon$-differential privacy.

Figures (10)

  • Figure 1: Organization of this survey.
  • Figure 2: Two strategies to improve FL algorithms in the trade-off space: 1) Reduce system redundancy. 2) Trade one objective with little decline for another with significant benefit. Smaller values are preferred. Each red dot indicates an FL algorithm.
  • Figure 3: Categorization of FL in terms of data space heterogeneity.
  • Figure 4: Data distribution in VFL
  • Figure 5: Data distribution in instance-sharing Hetero-FTL
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Definition 4.4
  • Definition A.1
  • Definition A.2
  • Definition A.3
  • Definition A.4
  • Theorem A.5
  • Theorem A.6
  • ...and 1 more