Table of Contents
Fetching ...

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang

TL;DR

The paper tackles the challenge of personalization in federated learning under simultaneous data heterogeneity and device capacity heterogeneity. It proposes Pa3dFL, which decouples model parameters into a shared general part and client-specific personal parts, uses a hyper-network to generate personalized parameters from client embeddings, and employs self-attention for implicit aggregation of personal knowledge. The approach yields strong accuracy gains across CIFAR10/100 and competitive results on FashionMNIST under heterogeneous capacities, while also improving efficiency in computation and communication. This framework offers a practical path to scalable, capacity-aware personalized FL with robust knowledge sharing and minimal privacy trade-offs.

Abstract

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

TL;DR

The paper tackles the challenge of personalization in federated learning under simultaneous data heterogeneity and device capacity heterogeneity. It proposes Pa3dFL, which decouples model parameters into a shared general part and client-specific personal parts, uses a hyper-network to generate personalized parameters from client embeddings, and employs self-attention for implicit aggregation of personal knowledge. The approach yields strong accuracy gains across CIFAR10/100 and competitive results on FashionMNIST under heterogeneous capacities, while also improving efficiency in computation and communication. This framework offers a practical path to scalable, capacity-aware personalized FL with robust knowledge sharing and minimal privacy trade-offs.

Abstract

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.
Paper Structure (54 sections, 2 theorems, 28 equations, 8 figures, 9 tables, 3 algorithms)

This paper contains 54 sections, 2 theorems, 28 equations, 8 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1

Suppose Assumptions assmp:smoothness, assmp:stoc-grad-var hold and the step sizes $\eta$ and $\gamma$ in Pa$^3$dFL are chosen as $\frac{2}{L_{\bold u}}$, $\frac{L_{\bold u}}{L_{\boldsymbol{\varphi}}}$ respectively, $\eta$ decays with the number of rounds $t$ and all model parameters initialized at t where $F^{*}$ is the lower bound of $F$.

Figures (8)

  • Figure 1: $Client_i$'s smaller pruned submodel compared to $Client_j$ is due to its relatively limited capacity (e.g., mobile phone versus computer). By assuming that knowledge embedded in the model is intricately interwoven across channels within each layer, we focus on the impact of model pruning on knowledge retention at each layer for clients. (a) Direct channel pruning potentially loses necessary general/local knowledge from all channels and preserves irrelative knowledge from other clients. (b) Personalized channel pruning will lose general knowledge from unseen channels and can preserve all necessary local knowledge perhaps with irrelative knowledge from overlapping channels. (c) Pruning after decomposition (ours) fully maintains general knowledge through complete general parameters sharing, and local knowledge varies with pruned personal parameters
  • Figure 2: Model testing accuracy v.s. Client Capacity on three benchmarks
  • Figure 3: Test accuracy v.s. communication rounds under the Hetero. setting
  • Figure 4: The model reduction ratios v.s. $R_2$, the lowest capacity $p_{min}$, the dimension and the type of operators
  • Figure 5: The visualization of data partition for CIFAR10 (a), CIFAR100 (B) and FashionMNIST (C). Each bar in the figures represents a client's local dataset and each label is assign to one color. The length of each bar reflects the size of the local data.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1: Convergence of Pa$^3$dFL
  • Theorem 1: Convergence of Pa$^3$dFL
  • proof