Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity
Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang
TL;DR
The paper tackles the challenge of personalization in federated learning under simultaneous data heterogeneity and device capacity heterogeneity. It proposes Pa3dFL, which decouples model parameters into a shared general part and client-specific personal parts, uses a hyper-network to generate personalized parameters from client embeddings, and employs self-attention for implicit aggregation of personal knowledge. The approach yields strong accuracy gains across CIFAR10/100 and competitive results on FashionMNIST under heterogeneous capacities, while also improving efficiency in computation and communication. This framework offers a practical path to scalable, capacity-aware personalized FL with robust knowledge sharing and minimal privacy trade-offs.
Abstract
Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.
