Table of Contents
Fetching ...

Training Diffusion Models with Federated Learning

Matthijs de Goede, Bart Cox, Jérémie Decouchant

TL;DR

The paper addresses privacy, copyright, and data authority concerns in diffusion-model training by introducing FedDiffuse, a federated learning framework that adapts FedAvg to train a DDPM on a shared UNet backbone. It introduces three communication-efficient strategies—USplit, ULatDec, and UDec—that leverage the UNet structure to reduce exchanged parameters while preserving image quality, achieving up to $74\%$ reduction in communication with IID data. Experimental results on Fashion-MNIST (and CelebA) show that FedDiffuse can approach centralized performance in FID with a smaller number of clients, though robustness to non-IID skew varies by method. Overall, the approach demonstrates that diffusion models can be trained in a privacy-preserving, decentralized manner, potentially broadening participation beyond major tech entities.

Abstract

The training of diffusion-based models for image generation is predominantly controlled by a select few Big Tech companies, raising concerns about privacy, copyright, and data authority due to their lack of transparency regarding training data. To ad-dress this issue, we propose a federated diffusion model scheme that enables the independent and collaborative training of diffusion models without exposing local data. Our approach adapts the Federated Averaging (FedAvg) algorithm to train a Denoising Diffusion Model (DDPM). Through a novel utilization of the underlying UNet backbone, we achieve a significant reduction of up to 74% in the number of parameters exchanged during training,compared to the naive FedAvg approach, whilst simultaneously maintaining image quality comparable to the centralized setting, as evaluated by the FID score.

Training Diffusion Models with Federated Learning

TL;DR

The paper addresses privacy, copyright, and data authority concerns in diffusion-model training by introducing FedDiffuse, a federated learning framework that adapts FedAvg to train a DDPM on a shared UNet backbone. It introduces three communication-efficient strategies—USplit, ULatDec, and UDec—that leverage the UNet structure to reduce exchanged parameters while preserving image quality, achieving up to reduction in communication with IID data. Experimental results on Fashion-MNIST (and CelebA) show that FedDiffuse can approach centralized performance in FID with a smaller number of clients, though robustness to non-IID skew varies by method. Overall, the approach demonstrates that diffusion models can be trained in a privacy-preserving, decentralized manner, potentially broadening participation beyond major tech entities.

Abstract

The training of diffusion-based models for image generation is predominantly controlled by a select few Big Tech companies, raising concerns about privacy, copyright, and data authority due to their lack of transparency regarding training data. To ad-dress this issue, we propose a federated diffusion model scheme that enables the independent and collaborative training of diffusion models without exposing local data. Our approach adapts the Federated Averaging (FedAvg) algorithm to train a Denoising Diffusion Model (DDPM). Through a novel utilization of the underlying UNet backbone, we achieve a significant reduction of up to 74% in the number of parameters exchanged during training,compared to the naive FedAvg approach, whilst simultaneously maintaining image quality comparable to the centralized setting, as evaluated by the FID score.
Paper Structure (6 sections, 7 equations, 7 figures, 2 tables, 3 algorithms)

This paper contains 6 sections, 7 equations, 7 figures, 2 tables, 3 algorithms.

Figures (7)

  • Figure 1: Graphical representation of the intuition behind the DDPM. The reverse denoising process uses Gaussian transition kernels with fixed covariances $\Sigma_{\theta}(x_t, t)$ and means $\mu_\theta(x_t,t)$ that are learned using a neural network predicting the noise $\epsilon_\theta(x_t,t)$ to subtract from samples $x_t$ at each timestep $t$.
  • Figure 2: UNet depiction showing the widths, heights, and counts for the feature maps resulting from the different operations in the encoder, bottleneck, and decoder. For each network part, the training methods that consider it for federated training are indicated within the brackets.
  • Figure 3: Mean FID scores with error bounds for different number of clients $K$ and local epochs $E$ with $R=15$ in the Full federated setting on IID data.
  • Figure 4: Cumulative number of communicated parameters $(\cdot 10^8)$ during training for the different training methods with $K=5$.
  • Figure 5: Fashion-MNIST samples generated with the baseline model (first row) and FedDiffuse models trained using the Full (second row), USplit (third row), ULatDec (fourth row), and UDec (fifth row) methods with $K = 5$, $R = 15$ and $E = 5$.
  • ...and 2 more figures