Table of Contents
Fetching ...

pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

Jiahao Lai, Jiaqi Li, Jian Xu, Yanru Wu, Boshi Tang, Siqi Chen, Yongfeng Huang, Wenbo Ding, Yang Li

TL;DR

pFedGPA tackles non-IID data in federated learning by replacing linear parameter averaging with a server-side diffusion model that learns the joint distribution of client parameters. It introduces parameter inversion to generate personalized parameters for each client, guided by global diffusion knowledge to accelerate initialization. The method demonstrates superior performance across multiple datasets, robust ablations, and practical considerations regarding training time and privacy. This diffusion-based generative aggregation offers a scalable path to more effective personalization in FL without increasing communication overhead.

Abstract

Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server. Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions, potentially overlooking the complex, high-dimensional nature of the parameter space. This can result in degraded performance of the aggregated model. While personalized FL approaches can mitigate the heterogeneous data issue to some extent, the limitation of linear aggregation remains unresolved. To alleviate this issue, we investigate the generative approach of diffusion model and propose a novel generative parameter aggregation framework for personalized FL, \texttt{pFedGPA}. In this framework, we deploy a diffusion model on the server to integrate the diverse parameter distributions and propose a parameter inversion method to efficiently generate a set of personalized parameters for each client. This inversion method transforms the uploaded parameters into a latent code, which is then aggregated through denoising sampling to produce the final personalized parameters. By encoding the dependence of a client's model parameters on the specific data distribution using the high-capacity diffusion model, \texttt{pFedGPA} can effectively decouple the complexity of the overall distribution of all clients' model parameters from the complexity of each individual client's parameter distribution. Our experimental results consistently demonstrate the superior performance of the proposed method across multiple datasets, surpassing baseline approaches.

pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

TL;DR

pFedGPA tackles non-IID data in federated learning by replacing linear parameter averaging with a server-side diffusion model that learns the joint distribution of client parameters. It introduces parameter inversion to generate personalized parameters for each client, guided by global diffusion knowledge to accelerate initialization. The method demonstrates superior performance across multiple datasets, robust ablations, and practical considerations regarding training time and privacy. This diffusion-based generative aggregation offers a scalable path to more effective personalization in FL without increasing communication overhead.

Abstract

Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server. Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions, potentially overlooking the complex, high-dimensional nature of the parameter space. This can result in degraded performance of the aggregated model. While personalized FL approaches can mitigate the heterogeneous data issue to some extent, the limitation of linear aggregation remains unresolved. To alleviate this issue, we investigate the generative approach of diffusion model and propose a novel generative parameter aggregation framework for personalized FL, \texttt{pFedGPA}. In this framework, we deploy a diffusion model on the server to integrate the diverse parameter distributions and propose a parameter inversion method to efficiently generate a set of personalized parameters for each client. This inversion method transforms the uploaded parameters into a latent code, which is then aggregated through denoising sampling to produce the final personalized parameters. By encoding the dependence of a client's model parameters on the specific data distribution using the high-capacity diffusion model, \texttt{pFedGPA} can effectively decouple the complexity of the overall distribution of all clients' model parameters from the complexity of each individual client's parameter distribution. Our experimental results consistently demonstrate the superior performance of the proposed method across multiple datasets, surpassing baseline approaches.
Paper Structure (32 sections, 14 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 14 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Parameter collapse can occur when linearly averaging the parameters from different clients. Bright colors indicate high-probability regions of the parameter space, where the parameters located at the peaks of the model distribution are well-optimized for specific tasks.
  • Figure 2: Illustration of the training process for a new client $k$ with and without global guidance. The green arrows represent training starting from the initial parameters $\theta^{k}_{0}$ solely on local data $D_{k}$, gradually converging to the final optimized parameters $\theta^{k}_{*}$ within $S$ iterations. The purple arrows indicate training with global guidance $G$ alternated with local data training, which accelerates initialization and converges within $I \ll S$ iterations.
  • Figure 3: Illustration of the Parameter Inversion. Starting with the initial parameter $\theta_{0}$, it is diffused through several steps to reach $\theta_{1}, \dots, \theta_{T}$. During this process, the noise introduced between consecutive time steps and the final state $\theta_{T}$ are recorded as the latent code for $\theta_{0}$. In the denoising sampling phase of the diffusion model, these elements are gradually encoded to produce a new parameter $\tilde{\theta}_{0}$. Notice that it is challenging to obtain $\tilde{\theta}_{0}$ directly from $\theta_{0}$ using a linear aggregator.
  • Figure 4: Comparison of test accuracies across 20 clients on Fashion-MNIST and CIFAR-10 using the pFedGPA method, before and after the fine-tuning (FT) operation, with and without parameter inversion. Dashed lines indicate results before FT, and solid lines indicate results after FT.