A survey on Clustered Federated Learning: Taxonomy, Analysis and Applications
Michael Ben Ali, Omar El-Rifai, Imen Megdiche, André Peninou, Olivier Teste
TL;DR
This paper surveys Clustered Federated Learning (CFL) and clarifies its terminology by distinguishing Core CFL—algorithms that explicitly tackle non-IID data by training $K$ cluster-specific models—from Clustered X FL, which clusters clients for architectural or efficiency gains while retaining a single global model. It introduces a principled taxonomy dividing CFL into Server-side, Client-side, and Metadata-based approaches, and analyzes the trade-offs among privacy, computation, and communication. The review traces the evolution of domain-agnostic CFL methods (e.g., MTCFL, IFCA) and metadata-based approaches, assesses evaluation practices against a non-IID taxonomy, and highlights the dominance of metadata-based CFL in real-world applications despite privacy concerns. It also differentiates Clustered X FL variants (Decentralized, Hierarchical, Split, Resource-Aware) from Core CFL and outlines lessons and future directions to bridge privacy with practical efficiency, including modular, ablation-enabled designs and richer heterogeneity benchmarks. The work provides a cohesive framework to guide rigorous, domain-appropriate CFL research and practice, with implications for IoT, mobility, energy, and healthcare deployments.
Abstract
As Federated Learning (FL) expands, the challenge of non-independent and identically distributed (non-IID) data becomes critical. Clustered Federated Learning (CFL) addresses this by training multiple specialized models, each representing a group of clients with similar data distributions. However, the term ''CFL'' has increasingly been applied to operational strategies unrelated to data heterogeneity, creating significant ambiguity. This survey provides a systematic review of the CFL literature and introduces a principled taxonomy that classifies algorithms into Server-side, Client-side, and Metadata-based approaches. Our analysis reveals a distinct dichotomy: while theoretical research prioritizes privacy-preserving Server/Client-side methods, real-world applications in IoT, Mobility, and Energy overwhelmingly favor Metadata-based efficiency. Furthermore, we explicitly distinguish ''Core CFL'' (grouping clients for non-IID data) from ''Clustered X FL'' (operational variants for system heterogeneity). Finally, we outline lessons learned and future directions to bridge the gap between theoretical privacy and practical efficiency.
