FedCAPrivacy: Privacy-Preserving Heterogeneous Federated Learning with Anonymous Adaptive Clustering
Yunan Wei, Shengnan Zhao, Chuan Zhao, Zhe Liu, Zhenxiang Chen, Minghao Zhao
TL;DR
FedCAPrivacy tackles privacy-preserving clustering in heterogeneous FL by introducing anonymous adaptive clustering with an oblivious shuffle-based anonymization and an iteration-based decay of clustering frequency. The framework uses a three-party architecture (BS, CSP, devices) and Paillier HE to perform secure cluster-wise aggregation, preventing server inference of participant identities and similarities. It includes a setup phase for key generation, local training with encrypted updates, and secure cluster aggregation with Gaussian similarity and spectral embedding. Experiments on MNIST, Fashion-MNIST, CIFAR-10/100 show the approach yields substantial training efficiency gains (≈$7\times$) while maintaining high privacy across non-IID and heterogeneous resource settings.
Abstract
Federated learning (FL) is a distributed machine learning paradigm enabling multiple clients to train a model collaboratively without exposing their local data. Among FL schemes, clustering is an effective technique addressing the heterogeneity issue (i.e., differences in data distribution and computational ability affect training performance and effectiveness) via grouping participants with similar computational resources or data distribution into clusters. However, intra-cluster data exchange poses privacy risks, while cluster selection and adaptation introduce challenges that may affect overall performance. To address these challenges, this paper introduces anonymous adaptive clustering, a novel approach that simultaneously enhances privacy protection and boosts training efficiency. Specifically, an oblivious shuffle-based anonymization method is designed to safeguard user identities and prevent the aggregation server from inferring similarities through clustering. Additionally, to improve performance, we introduce an iteration-based adaptive frequency decay strategy, which leverages variability in clustering probabilities to optimize training dynamics. With these techniques, we build the FedCAPrivacy; experiments show that FedCAPrivacy achieves ~7X improvement in terms of performance while maintaining high privacy.
