FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler
Hongyi Peng, Han Yu, Xiaoli Tang, Xiaoxiao Li
TL;DR
This paper addresses calibration reliability in Federated Learning under non-IID data, proposing FedCal which learns client-specific post-hoc scalers for local calibration and aggregates them into a global scaler to improve global calibration without global validation data. FedCal relies on an MLP-based scaler with order-preserving properties and uses weight matching to aggregate scalers via linear mode connectivity, enabling robust calibration through periodic synchronization with FedAvg. The approach yields substantial improvements in global calibration error across four datasets and varying non-IID levels, achieving up to roughly 63% reduction over unsafeguarded baselines and about 48% over non-ensemble calibration baselines, while maintaining or improving accuracy. This work demonstrates that coordinating local calibration with an aggregatable global calibrator can significantly enhance reliability in FL, with practical implications for high-stakes deployments and potential extensions with privacy-preserving analytics.
Abstract
Federated learning (FL) enables collaborative machine learning across distributed data owners, but data heterogeneity poses a challenge for model calibration. While prior work focused on improving accuracy for non-iid data, calibration remains under-explored. This study reveals existing FL aggregation approaches lead to sub-optimal calibration, and theoretical analysis shows despite constraining variance in clients' label distributions, global calibration error is still asymptotically lower bounded. To address this, we propose a novel Federated Calibration (FedCal) approach, emphasizing both local and global calibration. It leverages client-specific scalers for local calibration to effectively correct output misalignment without sacrificing prediction accuracy. These scalers are then aggregated via weight averaging to generate a global scaler, minimizing the global calibration error. Extensive experiments demonstrate FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47.66% on average.
