Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Zheng Wang; Zheng Wang; Zhaopeng Peng; Zihui Wang; Cheng Wang

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang

TL;DR

The paper tackles the challenge of personalization in federated learning under simultaneous data heterogeneity and device capacity heterogeneity. It proposes Pa3dFL, which decouples model parameters into a shared general part and client-specific personal parts, uses a hyper-network to generate personalized parameters from client embeddings, and employs self-attention for implicit aggregation of personal knowledge. The approach yields strong accuracy gains across CIFAR10/100 and competitive results on FashionMNIST under heterogeneous capacities, while also improving efficiency in computation and communication. This framework offers a practical path to scalable, capacity-aware personalized FL with robust knowledge sharing and minimal privacy trade-offs.

Abstract

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

TL;DR

Abstract

Paper Structure (54 sections, 2 theorems, 28 equations, 8 figures, 9 tables, 3 algorithms)

This paper contains 54 sections, 2 theorems, 28 equations, 8 figures, 9 tables, 3 algorithms.

Introduction
Problem Formulation
Methodology
Channel-aware Layer Decomposition
Model Reduction.
Aggregation Mechanism
Parameter Generation via HN.
Implict Aggregation.
Implementation
Analysis
Complexity.
Convergence.
Privacy.
Related Work
Personalized FL.
...and 39 more sections

Key Result

Theorem 1

Suppose Assumptions assmp:smoothness, assmp:stoc-grad-var hold and the step sizes $\eta$ and $\gamma$ in Pa$^3$dFL are chosen as $\frac{2}{L_{\bold u}}$, $\frac{L_{\bold u}}{L_{\boldsymbol{\varphi}}}$ respectively, $\eta$ decays with the number of rounds $t$ and all model parameters initialized at t where $F^{*}$ is the lower bound of $F$.

Figures (8)

Figure 1: $Client_i$'s smaller pruned submodel compared to $Client_j$ is due to its relatively limited capacity (e.g., mobile phone versus computer). By assuming that knowledge embedded in the model is intricately interwoven across channels within each layer, we focus on the impact of model pruning on knowledge retention at each layer for clients. (a) Direct channel pruning potentially loses necessary general/local knowledge from all channels and preserves irrelative knowledge from other clients. (b) Personalized channel pruning will lose general knowledge from unseen channels and can preserve all necessary local knowledge perhaps with irrelative knowledge from overlapping channels. (c) Pruning after decomposition (ours) fully maintains general knowledge through complete general parameters sharing, and local knowledge varies with pruned personal parameters
Figure 2: Model testing accuracy v.s. Client Capacity on three benchmarks
Figure 3: Test accuracy v.s. communication rounds under the Hetero. setting
Figure 4: The model reduction ratios v.s. $R_2$, the lowest capacity $p_{min}$, the dimension and the type of operators
Figure 5: The visualization of data partition for CIFAR10 (a), CIFAR100 (B) and FashionMNIST (C). Each bar in the figures represents a client's local dataset and each label is assign to one color. The length of each bar reflects the size of the local data.
...and 3 more figures

Theorems & Definitions (3)

Theorem 1: Convergence of Pa$^3$dFL
Theorem 1: Convergence of Pa$^3$dFL
proof

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

TL;DR

Abstract

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)