KITE: A Kernel-based Improved Transferability Estimation Method
Yunhui Guo
TL;DR
Kite addresses the challenge of selecting the most transfer-friendly pre-trained representation for a target dataset by introducing a kernel-based estimator that jointly measures feature separability and dissimilarity to random features. The method defines Target Alignment and Random Alignment via Centered Kernel Alignment and combines them as Kite = $CKA(K_s, K_Y)/CKA(K_s, K_{random})$, enabling accurate, training-free transferability estimates. Evaluations on a large-scale benchmark with 32 pre-trained models show that Kite substantially outperforms state-of-the-art baselines in Pearson correlation and rank-based metrics, while remaining fast and robust to probe-set size, kernel choice, and random initializations. The results reveal that target task characteristics (coarse vs fine-grained) influence the relative informativeness of TA and RA, and Kite effectively leverages both to provide reliable model selection for transfer learning. Overall, the study introduces a principled, kernel-based framework that improves transferability estimation and offers practical guidance for model selection in diverse scenarios.
Abstract
Transferability estimation has emerged as an important problem in transfer learning. A transferability estimation method takes as inputs a set of pre-trained models and decides which pre-trained model can deliver the best transfer learning performance. Existing methods tackle this problem by analyzing the output of the pre-trained model or by comparing the pre-trained model with a probe model trained on the target dataset. However, neither is sufficient to provide reliable and efficient transferability estimations. In this paper, we present a novel perspective and introduce Kite, as a Kernel-based Improved Transferability Estimation method. Kite is based on the key observations that the separability of the pre-trained features and the similarity of the pre-trained features to random features are two important factors for estimating transferability. Inspired by kernel methods, Kite adopts centered kernel alignment as an effective way to assess feature separability and feature similarity. Kite is easy to interpret, fast to compute, and robust to the target dataset size. We evaluate the performance of Kite on a recently introduced large-scale model selection benchmark. The benchmark contains 8 source dataset, 6 target datasets and 4 architectures with a total of 32 pre-trained models. Extensive results show that Kite outperforms existing methods by a large margin for transferability estimation.
