Table of Contents
Fetching ...

Assessing Pre-Trained Models for Transfer Learning Through Distribution of Spectral Components

Tengxue Zhang, Yang Shu, Xinyang Chen, Yifei Long, Chenjuan Guo, Bin Yang

TL;DR

This work introduces DISCO, a singular-value-based framework that assesses pre-trained models for transfer learning by analyzing the distribution of spectral components in extracted features. By grouping singular values into spectral components and weighting them with task-specific transferability scores, DISCO can predict ground-truth fine-tuning performance and rank models without extensive fine-tuning. It provides classification and regression-specific metrics, supports hard-example sampling to manage complexity, and demonstrates state-of-the-art correlation with fine-tuning results across image classification and object detection, including supervised and self-supervised pre-training. The approach offers a flexible, task-adaptive method for model selection that scales to diverse downstream tasks and model hubs, with practical time savings relative to brute-force fine-tuning.

Abstract

Pre-trained model assessment for transfer learning aims to identify the optimal candidate for the downstream tasks from a model hub, without the need of time-consuming fine-tuning. Existing advanced works mainly focus on analyzing the intrinsic characteristics of the entire features extracted by each pre-trained model or how well such features fit the target labels. This paper proposes a novel perspective for pre-trained model assessment through the Distribution of Spectral Components (DISCO). Through singular value decomposition of features extracted from pre-trained models, we investigate different spectral components and observe that they possess distinct transferability, contributing diversely to the fine-tuning performance. Inspired by this, we propose an assessment method based on the distribution of spectral components which measures the proportions of their corresponding singular values. Pre-trained models with features concentrating on more transferable components are regarded as better choices for transfer learning. We further leverage the labels of downstream data to better estimate the transferability of each spectral component and derive the final assessment criterion. Our proposed method is flexible and can be applied to both classification and regression tasks. We conducted comprehensive experiments across three benchmarks and two tasks including image classification and object detection, demonstrating that our method achieves state-of-the-art performance in choosing proper pre-trained models from the model hub for transfer learning.

Assessing Pre-Trained Models for Transfer Learning Through Distribution of Spectral Components

TL;DR

This work introduces DISCO, a singular-value-based framework that assesses pre-trained models for transfer learning by analyzing the distribution of spectral components in extracted features. By grouping singular values into spectral components and weighting them with task-specific transferability scores, DISCO can predict ground-truth fine-tuning performance and rank models without extensive fine-tuning. It provides classification and regression-specific metrics, supports hard-example sampling to manage complexity, and demonstrates state-of-the-art correlation with fine-tuning results across image classification and object detection, including supervised and self-supervised pre-training. The approach offers a flexible, task-adaptive method for model selection that scales to diverse downstream tasks and model hubs, with practical time savings relative to brute-force fine-tuning.

Abstract

Pre-trained model assessment for transfer learning aims to identify the optimal candidate for the downstream tasks from a model hub, without the need of time-consuming fine-tuning. Existing advanced works mainly focus on analyzing the intrinsic characteristics of the entire features extracted by each pre-trained model or how well such features fit the target labels. This paper proposes a novel perspective for pre-trained model assessment through the Distribution of Spectral Components (DISCO). Through singular value decomposition of features extracted from pre-trained models, we investigate different spectral components and observe that they possess distinct transferability, contributing diversely to the fine-tuning performance. Inspired by this, we propose an assessment method based on the distribution of spectral components which measures the proportions of their corresponding singular values. Pre-trained models with features concentrating on more transferable components are regarded as better choices for transfer learning. We further leverage the labels of downstream data to better estimate the transferability of each spectral component and derive the final assessment criterion. Our proposed method is flexible and can be applied to both classification and regression tasks. We conducted comprehensive experiments across three benchmarks and two tasks including image classification and object detection, demonstrating that our method achieves state-of-the-art performance in choosing proper pre-trained models from the model hub for transfer learning.

Paper Structure

This paper contains 43 sections, 18 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: The overall framework for assessing pre-trained models for transfer learning. Given a model pool, a designed metric quickly predicts and ranks model performance on the target dataset. The predicted rankings are expected to strongly correlate with the ground-truth fine-tuning results.
  • Figure 2: The relative changes of Frobenius norm $C_F$ and the proportion of singular values $S_{\text{ratio}}$ in different spectral components of extracted features before and after fine-tuning.
  • Figure 3: Overview of DISCO's framework (better viewed in color). $S_\text{ratio}^g$ represents the singular value ratio of the $g$-th spectral component, while $S_{\text{ncc}}^g$ and $S_{\text{lr}}^g$ are task-specific scores for classification and regression tasks, respectively. The overall transferability of the entire feature is calculated through the perspective of the distribution of spectral components.
  • Figure 4: (a) Ablation study on the framework and (b) on the object detection benchmark. (c) The average $\tau_\omega$ with different group numbers on three benchmarks. (d) Method comparison w.r.t average running time (seconds) and $\tau_\omega$ on 11 datasets.
  • Figure 5: The relative changes of Frobenius norm $C_F$ and the proportion of singular values $S_{\text{ratio}}$ in different spectral components of extracted features before and after fine-tuning.