Table of Contents
Fetching ...

Watch Your Head: Assembling Projection Heads to Save the Reliability of Federated Models

Jinqian Chen, Jihua Zhu, Qinghai Zheng, Zhongyu Li, Zhiqiang Tian

TL;DR

This work addresses reliability and calibration in federated learning under non-IID data, showing that generic and personalized FL suffer miscalibration and poor uncertainty on OOD samples. It identifies projection-head bias as a primary culprit and introduces Assembled Projection Heads (APH), which ensembles multiple locally fine-tuned projection heads initialized from a head prior and averaged at inference. APH is designed to be lightweight and compatible with state-of-the-art FL methods, incurring less than 30% extra computation for large models while substantially improving calibration (lower F-ECE) and uncertainty estimation on in-domain and OOD data. Empirical results on CIFAR-10/100 and Tiny-ImageNet with OOD benchmarks demonstrate robust gains in reliability, validating APH as a practical tool for trustworthy, privacy-preserving federated learning.

Abstract

Federated learning encounters substantial challenges with heterogeneous data, leading to performance degradation and convergence issues. While considerable progress has been achieved in mitigating such an impact, the reliability aspect of federated models has been largely disregarded. In this study, we conduct extensive experiments to investigate the reliability of both generic and personalized federated models. Our exploration uncovers a significant finding: \textbf{federated models exhibit unreliability when faced with heterogeneous data}, demonstrating poor calibration on in-distribution test data and low uncertainty levels on out-of-distribution data. This unreliability is primarily attributed to the presence of biased projection heads, which introduce miscalibration into the federated models. Inspired by this observation, we propose the "Assembled Projection Heads" (APH) method for enhancing the reliability of federated models. By treating the existing projection head parameters as priors, APH randomly samples multiple initialized parameters of projection heads from the prior and further performs targeted fine-tuning on locally available data under varying learning rates. Such a head ensemble introduces parameter diversity into the deterministic model, eliminating the bias and producing reliable predictions via head averaging. We evaluate the effectiveness of the proposed APH method across three prominent federated benchmarks. Experimental results validate the efficacy of APH in model calibration and uncertainty estimation. Notably, APH can be seamlessly integrated into various federated approaches but only requires less than 30\% additional computation cost for 100$\times$ inferences within large models.

Watch Your Head: Assembling Projection Heads to Save the Reliability of Federated Models

TL;DR

This work addresses reliability and calibration in federated learning under non-IID data, showing that generic and personalized FL suffer miscalibration and poor uncertainty on OOD samples. It identifies projection-head bias as a primary culprit and introduces Assembled Projection Heads (APH), which ensembles multiple locally fine-tuned projection heads initialized from a head prior and averaged at inference. APH is designed to be lightweight and compatible with state-of-the-art FL methods, incurring less than 30% extra computation for large models while substantially improving calibration (lower F-ECE) and uncertainty estimation on in-domain and OOD data. Empirical results on CIFAR-10/100 and Tiny-ImageNet with OOD benchmarks demonstrate robust gains in reliability, validating APH as a practical tool for trustworthy, privacy-preserving federated learning.

Abstract

Federated learning encounters substantial challenges with heterogeneous data, leading to performance degradation and convergence issues. While considerable progress has been achieved in mitigating such an impact, the reliability aspect of federated models has been largely disregarded. In this study, we conduct extensive experiments to investigate the reliability of both generic and personalized federated models. Our exploration uncovers a significant finding: \textbf{federated models exhibit unreliability when faced with heterogeneous data}, demonstrating poor calibration on in-distribution test data and low uncertainty levels on out-of-distribution data. This unreliability is primarily attributed to the presence of biased projection heads, which introduce miscalibration into the federated models. Inspired by this observation, we propose the "Assembled Projection Heads" (APH) method for enhancing the reliability of federated models. By treating the existing projection head parameters as priors, APH randomly samples multiple initialized parameters of projection heads from the prior and further performs targeted fine-tuning on locally available data under varying learning rates. Such a head ensemble introduces parameter diversity into the deterministic model, eliminating the bias and producing reliable predictions via head averaging. We evaluate the effectiveness of the proposed APH method across three prominent federated benchmarks. Experimental results validate the efficacy of APH in model calibration and uncertainty estimation. Notably, APH can be seamlessly integrated into various federated approaches but only requires less than 30\% additional computation cost for 100 inferences within large models.
Paper Structure (16 sections, 4 equations, 7 figures, 4 tables)

This paper contains 16 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Generic Federated Models are Not Reliable. Compared with the centralized training models, the generic federated models tend to be more overconfident on misclassified samples and exhibit lower uncertainty (i.e. lower predictive entropy) on out-of-distribution samples (See Section 3), demonstrating the serious reliability issue.
  • Figure 2: Reliability of Generic Federated Models. (a) F-ECE of different generic federated models compared with the centralized training model on in-domain test data. F-ECE of generic federated models is significantly higher than centralized training models, indicating severe overconfidence problems. (b) Histograms of predictive distribution entropy on OOD dataset. The predictive entropy of generic federated models is dramatically lower than the centralized training model, showing lower uncertainty levels to OOD samples.
  • Figure 3: Impact Factors on the Reliability of Generic Federated Model. We investigate the related impact of data quantity imbalance, local epoch number, Non-IID severity, participation ratio on the reliability of the federated model. As can be seen in (b) and (c), data quantity imbalance and local epoch have trivial impacts on the model reliability. (d) and (e) demonstrate that the Non-IID severity significantly harms the federated model's reliability and the low participation ratio magnifies such impact. (f) further illustrate that the participation ratio doesn't affect the reliability in IID data.
  • Figure 4: Reliability of Personalized Federated Models. (a) F-ECE of personalized federated models compared with centralized training model on the in-domain test dataset. (b) Histogram of predictive distribution entropy on OOD test data. Compared with the centralized training model, personalized models are more calibrated, while still exhibiting lower uncertainty when faced with OOD samples.
  • Figure 5: Influence of Head Fine-tuning. (a) Bar diagram of F-ECE before/after head fine-tuning. (b) Histogram of predictive entropy on OOD samples. The model achieves lower ECE than the centralized model (green dash line) after only 1 round head fine-tuning, while the uncertainty to OOD samples remains unreliable.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1