Table of Contents
Fetching ...

Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning

Dipam Goswami, Simone Magistri, Kai Wang, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer

TL;DR

This paper tackles training-free federated learning with pre-trained feature extractors by deriving a provably unbiased estimator of global class covariances using only per-class means from clients. It then initializes the global classifier using within-class covariances and omits between-class scatter to improve conditioning, applying shrinkage for stability. Empirically, FedCOF achieves higher accuracy than FedNCM and is competitive with or superior to Fed3R at the same communication cost across multiple benchmarks, while also enabling effective fine-tuning or linear probing later. The approach offers a practical, communication-efficient alternative to sharing second-order statistics, with strong robustness to non-iid data and dense federations, making it appealing for large-scale deployment with pre-trained backbones.

Abstract

Using pre-trained models has been found to reduce the effect of data heterogeneity and speed up federated learning algorithms. Recent works have explored training-free methods using first- and second-order statistics to aggregate local client data distributions at the server and achieve high performance without any training. In this work, we propose a training-free method based on an unbiased estimator of class covariance matrices which only uses first-order statistics in the form of class means communicated by clients to the server. We show how these estimated class covariances can be used to initialize the global classifier, thus exploiting the covariances without actually sharing them. We also show that using only within-class covariances results in a better classifier initialization. Our approach improves performance in the range of 4-26% with exactly the same communication cost when compared to methods sharing only class means and achieves performance competitive or superior to methods sharing second-order statistics with dramatically less communication overhead. The proposed method is much more communication-efficient than federated prompt-tuning methods and still outperforms them. Finally, using our method to initialize classifiers and then performing federated fine-tuning or linear probing again yields better performance. Code is available at https://github.com/dipamgoswami/FedCOF.

Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning

TL;DR

This paper tackles training-free federated learning with pre-trained feature extractors by deriving a provably unbiased estimator of global class covariances using only per-class means from clients. It then initializes the global classifier using within-class covariances and omits between-class scatter to improve conditioning, applying shrinkage for stability. Empirically, FedCOF achieves higher accuracy than FedNCM and is competitive with or superior to Fed3R at the same communication cost across multiple benchmarks, while also enabling effective fine-tuning or linear probing later. The approach offers a practical, communication-efficient alternative to sharing second-order statistics, with strong robustness to non-iid data and dense federations, making it appealing for large-scale deployment with pre-trained backbones.

Abstract

Using pre-trained models has been found to reduce the effect of data heterogeneity and speed up federated learning algorithms. Recent works have explored training-free methods using first- and second-order statistics to aggregate local client data distributions at the server and achieve high performance without any training. In this work, we propose a training-free method based on an unbiased estimator of class covariance matrices which only uses first-order statistics in the form of class means communicated by clients to the server. We show how these estimated class covariances can be used to initialize the global classifier, thus exploiting the covariances without actually sharing them. We also show that using only within-class covariances results in a better classifier initialization. Our approach improves performance in the range of 4-26% with exactly the same communication cost when compared to methods sharing only class means and achieves performance competitive or superior to methods sharing second-order statistics with dramatically less communication overhead. The proposed method is much more communication-efficient than federated prompt-tuning methods and still outperforms them. Finally, using our method to initialize classifiers and then performing federated fine-tuning or linear probing again yields better performance. Code is available at https://github.com/dipamgoswami/FedCOF.

Paper Structure

This paper contains 25 sections, 4 theorems, 35 equations, 9 figures, 16 tables, 1 algorithm.

Key Result

Proposition 1

Let $K$ be the number of clients, each with $n_{k,c}$ features, and let $C$ be the total number of classes. Let $\hat{\mu}_c = \frac{1}{N_c}\sum_{j=1}^{N_c} F^j$ be the unbiased estimator of the population mean $\mu_c$ and $N_c=\sum_{k=1}^K n_{k,c}$ be the total number of features for a single class is an unbiased estimator of the population covariance $\Sigma_c$, for all $c \in 1, \ldots, C$.

Figures (9)

  • Figure 1: FedNCM legate2023guiding shares only class means $\hat{\mu}_{k,c}$ and has minimal communication. Fed3R fani2024accelerating requires sum of class features $B_k$ and feature matrix $G_k$ from all clients, thereby increasing the communication cost by $d^2K$. We propose FedCOF, which shares only class means and estimates a global class covariance $\hat{\Sigma}_c$ to initialize the classifier weights. Note that only a small subset of all classes are present in each client. For simplicity, we show the upper bound of communication cost here where $C'$ denotes the maximum number of classes present in a single client.
  • Figure 2: Federated Learning with COvariances for Free (FedCOF). Each client $k$ communicates only its class means $\hat{\mu}_{k,c}$ and counts $n_{k,c}$. On the server side, (A) we use a provably unbiased estimator $\hat{\Sigma}_c$ (denoted by solid lines) of population covariance $\Sigma_c$ (denoted by dashed lines) based on the received class means (see \ref{['sec:4.2']}). (B) We initialize the linear classifier using the estimated second-order statistics and remove the between-class scatter matrix as discussed in \ref{['sec:4.3']}.
  • Figure 3: Performance comparison when initializing with different methods and fine-tuning with FedAdam reddi2020adaptive and FedAvg mcmahan2017communication. We also compare with FedAdam and FedAvg using a pre-trained backbone and random classifier initialization. The training-free initialization stages are shown as dotted lines and stars represents the start of fine-tuning stages. Accuracies are averaged over 3 seeds.
  • Figure 4: Performance of different classifier initialization methods when linear-probing with FedAvg mcmahan2017communication. FedAvg-LP (in blue) uses random classifier initialization and a pre-trained backbone. The training-free initialization stage is shown in dotted lines, stars represents the start of linear probing.
  • Figure 5: Ablation experiments: (left) Change in performance with varying number of clients and data heterogeneity. (center) Sharing multiple class means per client improves FedCOF performance. (right) Impact of shrinkage with varying number of clients and sampled means per client.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Proposition 2
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • proof