Synthetic Data Aided Federated Learning Using Foundation Models

Fatima Abacha; Sin G. Teo; Lucas C. Cordeiro; Mustafa A. Mustafa

Synthetic Data Aided Federated Learning Using Foundation Models

Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, Mustafa A. Mustafa

TL;DR

Federated learning under Non-IID data suffers from slow convergence and reduced performance due to data heterogeneity. The authors introduce DPSDA-FL, a two-stage framework that generates differentially private synthetic data using foundation models: Stage 1 locally produces $D_{csyn}$ and forms a global $D_{Gsyn}$, and Stage 2 distributes $D_{Gsyn}$ to clients to augment their local data. This augmentation reduces local-data disparity and stabilizes training, yielding recall improvements up to 26% and accuracy gains up to 9% on CIFAR-10 compared to standard FL baselines. The approach demonstrates practical gains for cross-silo FL with sensitive data, combining foundation models with differential privacy to enable privacy-preserving data augmentation and improved global model performance; future work includes wider datasets and deeper privacy analyses.

Abstract

In heterogeneous scenarios where the data distribution amongst the Federated Learning (FL) participants is Non-Independent and Identically distributed (Non-IID), FL suffers from the well known problem of data heterogeneity. This leads the performance of FL to be significantly degraded, as the global model tends to struggle to converge. To solve this problem, we propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL), a novel data augmentation strategy that aids in homogenizing the local data present on the clients' side. DPSDA-FL improves the training of the local models by leveraging differentially private synthetic data generated from foundation models. We demonstrate the effectiveness of our approach by evaluating it on the benchmark image dataset: CIFAR-10. Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues.

Synthetic Data Aided Federated Learning Using Foundation Models

TL;DR

and forms a global

, and Stage 2 distributes

to clients to augment their local data. This augmentation reduces local-data disparity and stabilizes training, yielding recall improvements up to 26% and accuracy gains up to 9% on CIFAR-10 compared to standard FL baselines. The approach demonstrates practical gains for cross-silo FL with sensitive data, combining foundation models with differential privacy to enable privacy-preserving data augmentation and improved global model performance; future work includes wider datasets and deeper privacy analyses.

Abstract

Paper Structure (18 sections, 2 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 2 figures, 3 tables, 1 algorithm.

Introduction
Background and Related Work
Data Heterogeneity
Generative Adversarial Networks
Foundation Models
Differentially Private Synthetic Data
DPSDA-FL: Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models
Experiments and Evaluations
Experimental Setting
Dataset
Differentially Private Synthetic Dataset
Data Distribution
Neural Network Architecture
Cross-Silo Horizontal FL
Evaluation Metrics
...and 3 more sections

Figures (2)

Figure 1: Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models.
Figure 2: Confusion matrices highlighting the correct predictions and misclassifications made by the various approaches.

Synthetic Data Aided Federated Learning Using Foundation Models

TL;DR

Abstract

Synthetic Data Aided Federated Learning Using Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)