Source-Free Domain-Invariant Performance Prediction
Ekaterina Khramtsova, Mahsa Baktashmotlagh, Guido Zuccon, Xi Wang, Mathieu Salzmann
TL;DR
Predicting a model's target-domain accuracy without access to source data is tackled with a novel source-free framework that uses an uncertainty-based calibration via a probabilistic generative model trained on target statistics. The approach calibrates predictions unsupervisedly and assesses correctness through gradient norms of cross-entropy losses, connecting to temperature scaling. Empirical results across single- and multi-domain benchmarks show strong improvements over state-of-the-art source-free methods and competitive performance against source-based baselines, especially under limited source data. This enables robust domain-invariant performance estimation in privacy- and data-constrained settings with practical impact for deployment under distributional shift.
Abstract
Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do not require source data exhibit considerably inferior performance. In this work, we propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data. We establish connections between our approach for unsupervised calibration and temperature scaling. We then employ a gradient-based strategy to evaluate the correctness of the calibrated predictions. Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability. Furthermore, our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
