Table of Contents
Fetching ...

Unlocking the Potential of Model Calibration in Federated Learning

Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher Brinton

TL;DR

This work addresses the overlooked issue of probability calibration in federated learning, where models can be miscalibrated under data heterogeneity and client privacy constraints. It introduces Non-Uniform Calibration for Federated Learning (NUCFL), a framework that injects a train-time calibration loss into local FL training and dynamically sets client-specific penalties based on the similarity between local and global models, using measures such as cosine similarity or Centered Kernel Alignment (CKA). By tying calibration penalties to how closely each client aligns with the global model, NUCFL improves calibration (lower ECE and SCE) without sacrificing accuracy and is compatible with a range of FL algorithms and calibration losses (DCA/MDCA). The extensive experiments demonstrate that NUCFL yields robust gains across datasets (MNIST, FEMNIST, CIFAR-10/100) and FL strategies, and it remains effective under different data heterogeneity levels and participation scenarios, highlighting its practical impact for trustworthy FL systems.

Abstract

Over the past several years, various federated learning (FL) methodologies have been developed to improve model accuracy, a primary performance metric in machine learning. However, to utilize FL in practical decision-making scenarios, beyond considering accuracy, the trained model must also have a reliable confidence in each of its predictions, an aspect that has been largely overlooked in existing FL research. Motivated by this gap, we propose Non-Uniform Calibration for Federated Learning (NUCFL), a generic framework that integrates FL with the concept of model calibration. The inherent data heterogeneity in FL environments makes model calibration particularly difficult, as it must ensure reliability across diverse data distributions and client conditions. Our NUCFL addresses this challenge by dynamically adjusting the model calibration objectives based on statistical relationships between each client's local model and the global model in FL. In particular, NUCFL assesses the similarity between local and global model relationships, and controls the penalty term for the calibration loss during client-side local training. By doing so, NUCFL effectively aligns calibration needs for the global model in heterogeneous FL settings while not sacrificing accuracy. Extensive experiments show that NUCFL offers flexibility and effectiveness across various FL algorithms, enhancing accuracy as well as model calibration.

Unlocking the Potential of Model Calibration in Federated Learning

TL;DR

This work addresses the overlooked issue of probability calibration in federated learning, where models can be miscalibrated under data heterogeneity and client privacy constraints. It introduces Non-Uniform Calibration for Federated Learning (NUCFL), a framework that injects a train-time calibration loss into local FL training and dynamically sets client-specific penalties based on the similarity between local and global models, using measures such as cosine similarity or Centered Kernel Alignment (CKA). By tying calibration penalties to how closely each client aligns with the global model, NUCFL improves calibration (lower ECE and SCE) without sacrificing accuracy and is compatible with a range of FL algorithms and calibration losses (DCA/MDCA). The extensive experiments demonstrate that NUCFL yields robust gains across datasets (MNIST, FEMNIST, CIFAR-10/100) and FL strategies, and it remains effective under different data heterogeneity levels and participation scenarios, highlighting its practical impact for trustworthy FL systems.

Abstract

Over the past several years, various federated learning (FL) methodologies have been developed to improve model accuracy, a primary performance metric in machine learning. However, to utilize FL in practical decision-making scenarios, beyond considering accuracy, the trained model must also have a reliable confidence in each of its predictions, an aspect that has been largely overlooked in existing FL research. Motivated by this gap, we propose Non-Uniform Calibration for Federated Learning (NUCFL), a generic framework that integrates FL with the concept of model calibration. The inherent data heterogeneity in FL environments makes model calibration particularly difficult, as it must ensure reliability across diverse data distributions and client conditions. Our NUCFL addresses this challenge by dynamically adjusting the model calibration objectives based on statistical relationships between each client's local model and the global model in FL. In particular, NUCFL assesses the similarity between local and global model relationships, and controls the penalty term for the calibration loss during client-side local training. By doing so, NUCFL effectively aligns calibration needs for the global model in heterogeneous FL settings while not sacrificing accuracy. Extensive experiments show that NUCFL offers flexibility and effectiveness across various FL algorithms, enhancing accuracy as well as model calibration.
Paper Structure (23 sections, 8 equations, 7 figures, 27 tables, 2 algorithms)

This paper contains 23 sections, 8 equations, 7 figures, 27 tables, 2 algorithms.

Figures (7)

  • Figure 1: Reliability diagrams and calibration errors for centralized training and non-IID FL (using FedAvg) trained with various calibration methods on CIFAR-100 dataset. Our method ensures well-calibrated FL, evidenced by a notably smaller calibration error and a smaller gap (red region) between confidence and accuracy.
  • Figure 2: Idea of proposed NUCFL.
  • Figure 3: Reliability diagrams for non-IID FedAvg with different calibration methods using the CIFAR-100 dataset. The lower ECE and smaller gap (red region) show the effectiveness of our method.
  • Figure 4: Reliability diagrams for non-IID FedAvg using the CIFAR-10 dataset.
  • Figure 5: Comparison of confidence calibration across different FL settings using the CIFAR-100 dataset.
  • ...and 2 more figures