Back to the Basics on Predicting Transfer Performance
Levy Chaves, Eduardo Valle, Alceu Bissoto, Sandra Avila
TL;DR
This paper tackles the challenge of predicting transfer performance by benchmarking a wide set of transferability scorers with a rigorous experimental-design framework and introduces Back to Bayes, a three-level Bayesian hierarchical regression to fuse multiple scorers. The authors show that the aggregated analysis and bootstrapped benchmarking yield more reliable estimates than per-dataset evaluations, and demonstrate that while the ImageNet baseline remains a strong predictor, combining diverse scorers with ImageNet generally improves transferability predictions, especially on challenging medical datasets. Key contributions include a robust benchmark design with aggregated tau metrics and a principled method to calibrate scorers that can be reused across new target datasets. The work highlights the value of information fusion in transferability estimation and points to future work leveraging posterior uncertainty and broader transfer scenarios for practical deployment.
Abstract
In the evolving landscape of deep learning, selecting the best pre-trained models from a growing number of choices is a challenge. Transferability scorers propose alleviating this scenario, but their recent proliferation, ironically, poses the challenge of their own assessment. In this work, we propose both robust benchmark guidelines for transferability scorers, and a well-founded technique to combine multiple scorers, which we show consistently improves their results. We extensively evaluate 13 scorers from literature across 11 datasets, comprising generalist, fine-grained, and medical imaging datasets. We show that few scorers match the predictive performance of the simple raw metric of models on ImageNet, and that all predictors suffer on medical datasets. Our results highlight the potential of combining different information sources for reliably predicting transferability across varied domains.
