Table of Contents
Fetching ...

KITE: A Kernel-based Improved Transferability Estimation Method

Yunhui Guo

TL;DR

Kite addresses the challenge of selecting the most transfer-friendly pre-trained representation for a target dataset by introducing a kernel-based estimator that jointly measures feature separability and dissimilarity to random features. The method defines Target Alignment and Random Alignment via Centered Kernel Alignment and combines them as Kite = $CKA(K_s, K_Y)/CKA(K_s, K_{random})$, enabling accurate, training-free transferability estimates. Evaluations on a large-scale benchmark with 32 pre-trained models show that Kite substantially outperforms state-of-the-art baselines in Pearson correlation and rank-based metrics, while remaining fast and robust to probe-set size, kernel choice, and random initializations. The results reveal that target task characteristics (coarse vs fine-grained) influence the relative informativeness of TA and RA, and Kite effectively leverages both to provide reliable model selection for transfer learning. Overall, the study introduces a principled, kernel-based framework that improves transferability estimation and offers practical guidance for model selection in diverse scenarios.

Abstract

Transferability estimation has emerged as an important problem in transfer learning. A transferability estimation method takes as inputs a set of pre-trained models and decides which pre-trained model can deliver the best transfer learning performance. Existing methods tackle this problem by analyzing the output of the pre-trained model or by comparing the pre-trained model with a probe model trained on the target dataset. However, neither is sufficient to provide reliable and efficient transferability estimations. In this paper, we present a novel perspective and introduce Kite, as a Kernel-based Improved Transferability Estimation method. Kite is based on the key observations that the separability of the pre-trained features and the similarity of the pre-trained features to random features are two important factors for estimating transferability. Inspired by kernel methods, Kite adopts centered kernel alignment as an effective way to assess feature separability and feature similarity. Kite is easy to interpret, fast to compute, and robust to the target dataset size. We evaluate the performance of Kite on a recently introduced large-scale model selection benchmark. The benchmark contains 8 source dataset, 6 target datasets and 4 architectures with a total of 32 pre-trained models. Extensive results show that Kite outperforms existing methods by a large margin for transferability estimation.

KITE: A Kernel-based Improved Transferability Estimation Method

TL;DR

Kite addresses the challenge of selecting the most transfer-friendly pre-trained representation for a target dataset by introducing a kernel-based estimator that jointly measures feature separability and dissimilarity to random features. The method defines Target Alignment and Random Alignment via Centered Kernel Alignment and combines them as Kite = , enabling accurate, training-free transferability estimates. Evaluations on a large-scale benchmark with 32 pre-trained models show that Kite substantially outperforms state-of-the-art baselines in Pearson correlation and rank-based metrics, while remaining fast and robust to probe-set size, kernel choice, and random initializations. The results reveal that target task characteristics (coarse vs fine-grained) influence the relative informativeness of TA and RA, and Kite effectively leverages both to provide reliable model selection for transfer learning. Overall, the study introduces a principled, kernel-based framework that improves transferability estimation and offers practical guidance for model selection in diverse scenarios.

Abstract

Transferability estimation has emerged as an important problem in transfer learning. A transferability estimation method takes as inputs a set of pre-trained models and decides which pre-trained model can deliver the best transfer learning performance. Existing methods tackle this problem by analyzing the output of the pre-trained model or by comparing the pre-trained model with a probe model trained on the target dataset. However, neither is sufficient to provide reliable and efficient transferability estimations. In this paper, we present a novel perspective and introduce Kite, as a Kernel-based Improved Transferability Estimation method. Kite is based on the key observations that the separability of the pre-trained features and the similarity of the pre-trained features to random features are two important factors for estimating transferability. Inspired by kernel methods, Kite adopts centered kernel alignment as an effective way to assess feature separability and feature similarity. Kite is easy to interpret, fast to compute, and robust to the target dataset size. We evaluate the performance of Kite on a recently introduced large-scale model selection benchmark. The benchmark contains 8 source dataset, 6 target datasets and 4 architectures with a total of 32 pre-trained models. Extensive results show that Kite outperforms existing methods by a large margin for transferability estimation.
Paper Structure (15 sections, 8 equations, 11 figures, 6 tables)

This paper contains 15 sections, 8 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Given a target dataset, the proposed Kite aims to select the best model for transfer learning from a library of pre-trained models.
  • Figure 2: Kite considers the separability of pre-trained features and the dissimilarity of the pre-trained features to random features for transferability estimations. The pre-trained model is first used to generate features for the target dataset. Then, we compute the pre-trained feature kernel matrix, the random feature kernel matrix and the target kernel matrix. CKA is used to compare the (dis)similarity of the pre-trained feature kernel matrix to the random feature kernel matrix and the target kernel matrix.
  • Figure 3: TA captures separability of the features. We validate the effectiveness of TA by generating multiple synthetic datasets. The datasets are generated by sampling from a mixture of two Gaussian distributions with different means. Clearly, the TA score correlates well with feature separability.
  • Figure 4: TA and RA uncover different patterns in the feature space. TA can detect feature separability while RA can expose sample-wise similarity.
  • Figure 5: Kite is better than the linear combination alternative. The red horizontal line denotes the result of Kite.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 3.1: Alignment cristianini2001kernel
  • Definition 3.2: Centered Kernel Matrix
  • Definition 3.3: Centered Kernel Alignment (CKA) cortes2012algorithms
  • Definition 4.1: Kite