Table of Contents
Fetching ...

Federated Learning on Virtual Heterogeneous Data with Local-global Distillation

Chun-Yin Huang, Ruinan Jin, Can Zhao, Daguang Xu, Xiaoxiao Li

TL;DR

FedLGD tackles data heterogeneity in federated learning by embedding dataset distillation into the FL loop using virtual data at both client and server. It combines Local Data Distillation via Iterative Distribution Matching in feature space with Global Data Distillation through Federated Gradient Matching to distill representative global information and harmonize domain shifts, complemented by a regularization term to align local and global representations. The approach is validated on DIGITS, CIFAR10C, and RETINA, showing improved accuracy over state-of-the-art heterogeneous FL methods and demonstrating robustness to varying IPCs, client numbers, and domain shifts; ablations illustrate the importance of the regularizer and the number of distillation steps. FedLGD also suggests privacy advantages by relying on synthesized data and gradient-based anchors, offering a practical path toward more efficient and privacy-conscious FL in heterogeneous environments.

Abstract

While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to local virtual data and use federated gradient matching to distill global virtual data that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms state-of-the-art heterogeneous FL algorithms under various settings. Our code is available at https://github.com/ubc-tea/FedLGD.

Federated Learning on Virtual Heterogeneous Data with Local-global Distillation

TL;DR

FedLGD tackles data heterogeneity in federated learning by embedding dataset distillation into the FL loop using virtual data at both client and server. It combines Local Data Distillation via Iterative Distribution Matching in feature space with Global Data Distillation through Federated Gradient Matching to distill representative global information and harmonize domain shifts, complemented by a regularization term to align local and global representations. The approach is validated on DIGITS, CIFAR10C, and RETINA, showing improved accuracy over state-of-the-art heterogeneous FL methods and demonstrating robustness to varying IPCs, client numbers, and domain shifts; ablations illustrate the importance of the regularizer and the number of distillation steps. FedLGD also suggests privacy advantages by relying on synthesized data and gradient-based anchors, offering a practical path toward more efficient and privacy-conscious FL in heterogeneous environments.

Abstract

While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to local virtual data and use federated gradient matching to distill global virtual data that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms state-of-the-art heterogeneous FL algorithms under various settings. Our code is available at https://github.com/ubc-tea/FedLGD.
Paper Structure (30 sections, 5 equations, 19 figures, 9 tables, 1 algorithm)

This paper contains 30 sections, 5 equations, 19 figures, 9 tables, 1 algorithm.

Figures (19)

  • Figure 1: Overview of the proposed method FedLGD. We split FL rounds into selected and unselected rounds. For the selected rounds, clients will refine the local virtual data and update local models, while the server uses aggregated gradients to update global virtual data and the global model. We term this procedure Local-Global Data Distillation. For the unselected rounds, we perform ordinary FL training with virtual data while adding regularization loss on local model updating. In the middle box, we also show the evolution of global and virtual data. Observe that although local virtual does not change visually, we found the local distillation steps are essential for improving model performance as shown in Fig. \ref{['fig:steps_digits']} and \ref{['fig:steps_cifar10c']}.
  • Figure 2: (a) Comparison between different regularization losses and their weightings($\lambda$). One can observe that $\mathcal{L}_{\rm Con}$ gives us better and more stable performance with different coefficient choices. (b) The solid curves describes the improved accuracy compared to $|\tau|=0$, and the dashed curve indicates the computation cost. The model performance improves with the increasing $|\tau|$, which is a trade-off between computation cost and model performance. Vary data updating steps for (c) DIGITS and (d) CIFAR10C. FedLGD yields consistent performance, and the accuracy improves with an increasing number of local and global steps.
  • Figure 3: tSNE plots on feature space for FedAvg, FedLGD without regularization, and FedLGD.
  • Figure 4: FedLGD reduces the Accumulated computation cost on the clients' side compared to FedAvg.
  • Figure 5: MIA results on models trained with FedAvg (using original dataset) and FedLGD (using distilled virtual dataset). If the ROC curve is the same as the diagonal line, it means the membership cannot be inferred.
  • ...and 14 more figures