One-Shot Clustering for Federated Learning
Maciej Krzysztof Zuziak, Roberto Pellungrini, Salvatore Rinzivillo
TL;DR
OCFL tackles the challenge of when to cluster clients in clustered federated learning by introducing a clustering-agnostic, one-shot scheme that activates clustering early in training based on a Clustering Temperature derived from gradient cosine similarities. The method defines a divergence matrix $\boldsymbol{\Gamma}$ and computes $T(\boldsymbol{\Gamma}) = \frac{||\boldsymbol{\Gamma}||_p}{\lambda}$ with $\lambda = (n(n-1)2^p)^{1/p}$ to signal convergence and trigger clustering. Through formal data-generating processes and experiments on MNIST, FMNIST, and CIFAR10, OCFL paired with density-based clustering (e.g., HDBSCAN, Mean-Shift) achieves high clustering quality (RAND ~ 0.95–0.98) and strong personalization while preserving generalization, outperforming several baselines. The work demonstrates practical benefits for cross-silo FL by enabling automatic, early CFL with minimal hyperparameter tuning, and it outlines future directions for privacy considerations and dynamic client environments.
Abstract
Federated Learning (FL) is a widespread and well adopted paradigm of decentralized learning that allows training one model from multiple sources without the need to directly transfer data between participating clients. Since its inception in 2015, it has been divided into numerous sub-fields that deal with application-specific issues, be it data heterogeneity or resource allocation. One such sub-field, Clustered Federated Learning (CFL), is dealing with the problem of clustering the population of clients into separate cohorts to deliver personalized models. Although few remarkable works have been published in this domain, the problem is still largely unexplored, as its basic assumption and settings are slightly different from standard FL. In this work, we present One-Shot Clustered Federated Learning (OCFL), a clustering-agnostic algorithm that can automatically detect the earliest suitable moment for clustering. Our algorithm is based on the computation of cosine similarity between gradients of the clients and a temperature measure that detects when the federated model starts to converge. We empirically evaluate our methodology by testing various one-shot clustering algorithms for over thirty different tasks on three benchmark datasets. Our experiments showcase the good performance of our approach when used to perform CFL in an automated manner without the need to adjust hyperparameters.
