Table of Contents
Fetching ...

Phoenix: A Federated Generative Diffusion Model

Fiona Victoria Stanley Jothiraj, Afra Mashhadi

TL;DR

Phoenix addresses privacy and data-ownership concerns in generative AI by federatedly training an unconditional diffusion model (DDPM) across distributed data sources. It introduces two core strategies—Data Sharing Strategy with parameters $\beta$ and $\alpha$, and Personalization layers with Threshold Filtering—to mitigate non-IID data effects and improve sample fidelity and mode coverage under FL. Across CIFAR-10 experiments, these methods yield meaningful improvements over GAN baselines and the default diffusion-FL setup, particularly in non-IID settings, while maintaining privacy and reducing inter-site communication. This work demonstrates the feasibility of high-quality, diverse synthetic data generation in a privacy-preserving, distributed regime and points to practical directions for speedups, real-world deployments, and further privacy enhancements.

Abstract

Generative AI has made impressive strides in enabling users to create diverse and realistic visual content such as images, videos, and audio. However, training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility. Federated learning (FL) is an approach that uses decentralized techniques to collaboratively train a shared deep learning model while retaining the training data on individual edge devices to preserve data privacy. This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using FL techniques. Diffusion models, a newly emerging generative model, show promising results in achieving superior quality images than Generative Adversarial Networks (GANs). Our proposed method Phoenix is an unconditional diffusion model that leverages strategies to improve the data diversity of generated samples even when trained on data with statistical heterogeneity or Non-IID (Non-Independent and Identically Distributed) data. We demonstrate how our approach outperforms the default diffusion model in an FL setting. These results indicate that high-quality samples can be generated by maintaining data diversity, preserving privacy, and reducing communication between data sources, offering exciting new possibilities in the field of generative AI.

Phoenix: A Federated Generative Diffusion Model

TL;DR

Phoenix addresses privacy and data-ownership concerns in generative AI by federatedly training an unconditional diffusion model (DDPM) across distributed data sources. It introduces two core strategies—Data Sharing Strategy with parameters and , and Personalization layers with Threshold Filtering—to mitigate non-IID data effects and improve sample fidelity and mode coverage under FL. Across CIFAR-10 experiments, these methods yield meaningful improvements over GAN baselines and the default diffusion-FL setup, particularly in non-IID settings, while maintaining privacy and reducing inter-site communication. This work demonstrates the feasibility of high-quality, diverse synthetic data generation in a privacy-preserving, distributed regime and points to practical directions for speedups, real-world deployments, and further privacy enhancements.

Abstract

Generative AI has made impressive strides in enabling users to create diverse and realistic visual content such as images, videos, and audio. However, training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility. Federated learning (FL) is an approach that uses decentralized techniques to collaboratively train a shared deep learning model while retaining the training data on individual edge devices to preserve data privacy. This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using FL techniques. Diffusion models, a newly emerging generative model, show promising results in achieving superior quality images than Generative Adversarial Networks (GANs). Our proposed method Phoenix is an unconditional diffusion model that leverages strategies to improve the data diversity of generated samples even when trained on data with statistical heterogeneity or Non-IID (Non-Independent and Identically Distributed) data. We demonstrate how our approach outperforms the default diffusion model in an FL setting. These results indicate that high-quality samples can be generated by maintaining data diversity, preserving privacy, and reducing communication between data sources, offering exciting new possibilities in the field of generative AI.
Paper Structure (23 sections, 2 equations, 14 figures, 5 tables)

This paper contains 23 sections, 2 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Directed Graphical Model of Diffusion Model ho2020denoising
  • Figure 2: Illustration of Personalization layers & Threshold Filtering strategy in FL setting Step 1: Personalization layers, Step 2: Generate samples to monitor performance and Step 3: Disconnect underperforming clients
  • Figure 4: Distribution of generated sample classes based on classification results from LaNet Model sorted in descending order for DCGAN (left) and default diffusion model(right)
  • Figure 5: Distribution of generated sample classes based on classification results from LaNet Model sorted in descending order for Data Sharing strategy (left) and Personalization layers & Threshold Filtering strategy (right)
  • Figure 6: Generated Samples from Phoenix with Default Diffusion Training, Data Sharing Strategy, and Personalizing Strategy respectively
  • ...and 9 more figures