Table of Contents
Fetching ...

Pre-Trained Model Recommendation for Downstream Fine-tuning

Jiameng Bai, Sai Wu, Jie Song, Junbo Zhao, Gang Chen

TL;DR

A pragmatic framework, delving into a diverse, large-scale model repository while meticulously considering the intricate connections between tasks and models, to map all models and historical tasks into a transfer-related subspace, where the distance between model vectors and task vectors represents the magnitude of transferability.

Abstract

As a fundamental problem in transfer learning, model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task. Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks. In this paper, we present a pragmatic framework \textbf{Fennec}, delving into a diverse, large-scale model repository while meticulously considering the intricate connections between tasks and models. The key insight is to map all models and historical tasks into a transfer-related subspace, where the distance between model vectors and task vectors represents the magnitude of transferability. A large vision model, as a proxy, infers a new task's representation in the transfer space, thereby circumventing the computational burden of extensive forward passes. We also investigate the impact of the inherent inductive bias of models on transfer results and propose a novel method called \textbf{archi2vec} to encode the intricate structures of models. The transfer score is computed through straightforward vector arithmetic with a time complexity of $\mathcal{O}(1)$. Finally, we make a substantial contribution to the field by releasing a comprehensive benchmark. We validate the effectiveness of our framework through rigorous testing on two benchmarks. The benchmark and the code will be publicly available in the near future.

Pre-Trained Model Recommendation for Downstream Fine-tuning

TL;DR

A pragmatic framework, delving into a diverse, large-scale model repository while meticulously considering the intricate connections between tasks and models, to map all models and historical tasks into a transfer-related subspace, where the distance between model vectors and task vectors represents the magnitude of transferability.

Abstract

As a fundamental problem in transfer learning, model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task. Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks. In this paper, we present a pragmatic framework \textbf{Fennec}, delving into a diverse, large-scale model repository while meticulously considering the intricate connections between tasks and models. The key insight is to map all models and historical tasks into a transfer-related subspace, where the distance between model vectors and task vectors represents the magnitude of transferability. A large vision model, as a proxy, infers a new task's representation in the transfer space, thereby circumventing the computational burden of extensive forward passes. We also investigate the impact of the inherent inductive bias of models on transfer results and propose a novel method called \textbf{archi2vec} to encode the intricate structures of models. The transfer score is computed through straightforward vector arithmetic with a time complexity of . Finally, we make a substantial contribution to the field by releasing a comprehensive benchmark. We validate the effectiveness of our framework through rigorous testing on two benchmarks. The benchmark and the code will be publicly available in the near future.
Paper Structure (15 sections, 14 equations, 13 figures, 7 tables)

This paper contains 15 sections, 14 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: The overview of Fennec framework. We simultaneously consider the impact of forward features (Transfer phase) and intrinsic properties (Meta phase) on transferability estimation while conducting model ranking with high efficiency (Merge phase).
  • Figure 2: The real transfer effects of diverse model architectures on Cifar10 and Nabird. The structure of the model exhibits a discernible connection with its transfer result.
  • Figure 3: The directed acyclic attributed graph established for ResNet18, ResNet10 and AlexNet, omitting the BatchNorm, Activation, and other subtle operations for the sake of simplicity(they are present in the actual graph construction).
  • Figure 4: Comparison of performance and time varying on probe sizes. As the probe dataset increases, the mean Pearson Correlation(PC) of most methods also increase. Figures(b)(d) reflect their slower computation time.
  • Figure 5: Figure(a)(c) reflects the influence of the dimension of matrix factorization on the mean PC. Figure(b)(d) reflects the impact of using different proxy models.
  • ...and 8 more figures