Table of Contents
Fetching ...

Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su

TL;DR

This paper addresses data heterogeneity in Personalized Federated Learning by proposing FedDecomp, an additive parameter decomposition that separates general knowledge into shared full-rank components and client-specific knowledge into low-rank personalized components. The method employs a novel alternating training strategy that first optimizes the low-rank personalized part and then the shared part, enhancing robustness to non-IID distributions. Empirical evaluations across CIFAR-10/100, Tiny ImageNet, larger datasets, and NLP tasks show consistent gains over eight SOTA baselines, with additional analysis on privacy and partial client participation. The approach demonstrates improved generalization, efficient communication, and stronger privacy, highlighting the value of explicit knowledge decoupling in federated settings.

Abstract

To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters shared with other clients to extract general knowledge and parameters retained locally to learn client-specific knowledge. However, as these two types of parameters are put together like a jigsaw puzzle into a single model during the training process, each parameter may simultaneously absorb both general and client-specific knowledge, thus struggling to separate the two types of knowledge effectively. In this paper, we introduce FedDecomp, a simple but effective PFL paradigm that employs parameter additive decomposition to address this issue. Instead of assigning each parameter of a model as either a shared or personalized one, FedDecomp decomposes each parameter into the sum of two parameters: a shared one and a personalized one, thus achieving a more thorough decoupling of shared and personalized knowledge compared to the parameter partitioning method. In addition, as we find that retaining local knowledge of specific clients requires much lower model capacity compared with general knowledge across all clients, we let the matrix containing personalized parameters be low rank during the training process. Moreover, a new alternating training strategy is proposed to further improve the performance. Experimental results across multiple datasets and varying degrees of data heterogeneity demonstrate that FedDecomp outperforms state-of-the-art methods up to 4.9\%. The code is available at https://github.com/XinghaoWu/FedDecomp.

Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

TL;DR

This paper addresses data heterogeneity in Personalized Federated Learning by proposing FedDecomp, an additive parameter decomposition that separates general knowledge into shared full-rank components and client-specific knowledge into low-rank personalized components. The method employs a novel alternating training strategy that first optimizes the low-rank personalized part and then the shared part, enhancing robustness to non-IID distributions. Empirical evaluations across CIFAR-10/100, Tiny ImageNet, larger datasets, and NLP tasks show consistent gains over eight SOTA baselines, with additional analysis on privacy and partial client participation. The approach demonstrates improved generalization, efficient communication, and stronger privacy, highlighting the value of explicit knowledge decoupling in federated settings.

Abstract

To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters shared with other clients to extract general knowledge and parameters retained locally to learn client-specific knowledge. However, as these two types of parameters are put together like a jigsaw puzzle into a single model during the training process, each parameter may simultaneously absorb both general and client-specific knowledge, thus struggling to separate the two types of knowledge effectively. In this paper, we introduce FedDecomp, a simple but effective PFL paradigm that employs parameter additive decomposition to address this issue. Instead of assigning each parameter of a model as either a shared or personalized one, FedDecomp decomposes each parameter into the sum of two parameters: a shared one and a personalized one, thus achieving a more thorough decoupling of shared and personalized knowledge compared to the parameter partitioning method. In addition, as we find that retaining local knowledge of specific clients requires much lower model capacity compared with general knowledge across all clients, we let the matrix containing personalized parameters be low rank during the training process. Moreover, a new alternating training strategy is proposed to further improve the performance. Experimental results across multiple datasets and varying degrees of data heterogeneity demonstrate that FedDecomp outperforms state-of-the-art methods up to 4.9\%. The code is available at https://github.com/XinghaoWu/FedDecomp.
Paper Structure (22 sections, 7 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: A toy example to illustrate the partition based method.
  • Figure 2: A toy example to illustrate our method. The depth of blue/orange in the shared/personalized parameters indicates the amount of knowledge from the corresponding parameters in the original parameter matrix.
  • Figure 3: Overview of one client in FedDecomp in one communication round.
  • Figure 4: A toy example to illustrate the alternating training in FedDecomp.
  • Figure 5: Effect of $E_{\text{lora}}$ in Dirichlet non-IID scenario with $\alpha = 0.1$.
  • ...and 4 more figures