Table of Contents
Fetching ...

Synthetic Data Aided Federated Learning Using Foundation Models

Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, Mustafa A. Mustafa

TL;DR

Federated learning under Non-IID data suffers from slow convergence and reduced performance due to data heterogeneity. The authors introduce DPSDA-FL, a two-stage framework that generates differentially private synthetic data using foundation models: Stage 1 locally produces $D_{csyn}$ and forms a global $D_{Gsyn}$, and Stage 2 distributes $D_{Gsyn}$ to clients to augment their local data. This augmentation reduces local-data disparity and stabilizes training, yielding recall improvements up to 26% and accuracy gains up to 9% on CIFAR-10 compared to standard FL baselines. The approach demonstrates practical gains for cross-silo FL with sensitive data, combining foundation models with differential privacy to enable privacy-preserving data augmentation and improved global model performance; future work includes wider datasets and deeper privacy analyses.

Abstract

In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL), a novel data augmentation strategy that aids in homogenizing the local data present on the clients' side. DPSDA-FL improves the training of the local models by leveraging differentially private synthetic data generated from foundation models. We demonstrate the effectiveness of our approach by evaluating it on the benchmark image dataset: CIFAR-10. Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues.

Synthetic Data Aided Federated Learning Using Foundation Models

TL;DR

Federated learning under Non-IID data suffers from slow convergence and reduced performance due to data heterogeneity. The authors introduce DPSDA-FL, a two-stage framework that generates differentially private synthetic data using foundation models: Stage 1 locally produces and forms a global , and Stage 2 distributes to clients to augment their local data. This augmentation reduces local-data disparity and stabilizes training, yielding recall improvements up to 26% and accuracy gains up to 9% on CIFAR-10 compared to standard FL baselines. The approach demonstrates practical gains for cross-silo FL with sensitive data, combining foundation models with differential privacy to enable privacy-preserving data augmentation and improved global model performance; future work includes wider datasets and deeper privacy analyses.

Abstract

In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL), a novel data augmentation strategy that aids in homogenizing the local data present on the clients' side. DPSDA-FL improves the training of the local models by leveraging differentially private synthetic data generated from foundation models. We demonstrate the effectiveness of our approach by evaluating it on the benchmark image dataset: CIFAR-10. Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues.
Paper Structure (18 sections, 2 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models.
  • Figure 2: Confusion matrices highlighting the correct predictions and misclassifications made by the various approaches.