Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning
Dipam Goswami, Simone Magistri, Kai Wang, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
TL;DR
This paper tackles training-free federated learning with pre-trained feature extractors by deriving a provably unbiased estimator of global class covariances using only per-class means from clients. It then initializes the global classifier using within-class covariances and omits between-class scatter to improve conditioning, applying shrinkage for stability. Empirically, FedCOF achieves higher accuracy than FedNCM and is competitive with or superior to Fed3R at the same communication cost across multiple benchmarks, while also enabling effective fine-tuning or linear probing later. The approach offers a practical, communication-efficient alternative to sharing second-order statistics, with strong robustness to non-iid data and dense federations, making it appealing for large-scale deployment with pre-trained backbones.
Abstract
Using pre-trained models has been found to reduce the effect of data heterogeneity and speed up federated learning algorithms. Recent works have explored training-free methods using first- and second-order statistics to aggregate local client data distributions at the server and achieve high performance without any training. In this work, we propose a training-free method based on an unbiased estimator of class covariance matrices which only uses first-order statistics in the form of class means communicated by clients to the server. We show how these estimated class covariances can be used to initialize the global classifier, thus exploiting the covariances without actually sharing them. We also show that using only within-class covariances results in a better classifier initialization. Our approach improves performance in the range of 4-26% with exactly the same communication cost when compared to methods sharing only class means and achieves performance competitive or superior to methods sharing second-order statistics with dramatically less communication overhead. The proposed method is much more communication-efficient than federated prompt-tuning methods and still outperforms them. Finally, using our method to initialize classifiers and then performing federated fine-tuning or linear probing again yields better performance. Code is available at https://github.com/dipamgoswami/FedCOF.
