Transferability-Guided Cross-Domain Cross-Task Transfer Learning

Yang Tan; Enming Zhang; Yang Li; Shao-Lun Huang; Xiao-Ping Zhang

Transferability-Guided Cross-Domain Cross-Task Transfer Learning

Yang Tan, Enming Zhang, Yang Li, Shao-Lun Huang, Xiao-Ping Zhang

TL;DR

This work tackles the problem of estimating transferability in cross-domain cross-task learning without relying on costly auxiliary tasks. It introduces two auxiliary-free metrics, F-OTCE and JC-OTCE, built on Optimal Transport to quantify transferability via a coupling between source and target data and the resulting negative conditional entropy; JC-OTCE further incorporates label-distance information into the ground cost. The authors demonstrate that these metrics outperform existing auxiliary-free approaches in correlation with ground-truth transfer accuracy while reducing computation time from minutes to seconds, with JC-OTCE approaching the accuracy of the auxiliary-based OTCE. They further show practical utility by using F-OTCE as an objective to guide OTCE-based finetuning and by integrating it into a domain-generalization framework (replacing distillation with F-OTCE), yielding improved few-shot performance across domains and tasks. The proposed framework offers scalable transferability estimation and transferable representation learning, enabling more efficient source-model selection, multi-source fusion, and cross-domain adaptation.

Abstract

We propose two novel transferability metrics F-OTCE (Fast Optimal Transport based Conditional Entropy) and JC-OTCE (Joint Correspondence OTCE) to evaluate how much the source model (task) can benefit the learning of the target task and to learn more transferable representations for cross-domain cross-task transfer learning. Unlike the existing metric that requires evaluating the empirical transferability on auxiliary tasks, our metrics are auxiliary-free such that they can be computed much more efficiently. Specifically, F-OTCE estimates transferability by first solving an Optimal Transport (OT) problem between source and target distributions, and then uses the optimal coupling to compute the Negative Conditional Entropy between source and target labels. It can also serve as a loss function to maximize the transferability of the source model before finetuning on the target task. Meanwhile, JC-OTCE improves the transferability robustness of F-OTCE by including label distances in the OT problem, though it may incur additional computation cost. Extensive experiments demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free metrics by 18.85% and 28.88%, respectively in correlation coefficient with the ground-truth transfer accuracy. By eliminating the training cost of auxiliary tasks, the two metrics reduces the total computation time of the previous method from 43 minutes to 9.32s and 10.78s, respectively, for a pair of tasks. When used as a loss function, F-OTCE shows consistent improvements on the transfer accuracy of the source model in few-shot classification experiments, with up to 4.41% accuracy gain.

Transferability-Guided Cross-Domain Cross-Task Transfer Learning

TL;DR

Abstract

Paper Structure (20 sections, 16 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 16 equations, 12 figures, 8 tables, 1 algorithm.

Introduction
Transferability Formulation
Preliminary Analysis of OTCE
Auxiliary-free Transferability Metrics
F-OTCE Metric
JC-OTCE Metric
Transferability-guided Transfer Learning
OTCE-based Model Finetuning
OTCE-based Domain Generalization
Few-shot Classification Task Definition
Experiments
Evaluation on Transferability Estimation
Efficiency Analysis
Effect of Parameter $\gamma$
Application in Source Model Selection
...and 5 more sections

Figures (12)

Figure 1: Illustration of three different transfer learning settings, i.e., transductive domain adaptation pan2009survey, cross-task transfer bao2019information and the cross-domain cross-task transfer we investigating.
Figure 2: Illustration of the auxiliary-based OTCE metric OTCE (top), and our proposed F-OTCE (middle) and JC-OTCE (bottom) metrics which do not require auxiliary tasks with known transfer accuracy to learn the weighting coefficients. For OTCE (top), $W_D$ and $W_T$ represent the domain difference and task difference between two tasks, respectively. To estimate the coefficients $\lambda_1, \lambda_2, b$ of the linear model, we need to sample at least three auxiliary tasks from the target dataset and calculate $W_D^i$, $W_T^i$ and transfer accuracy $TransferAcc^i$ between the source task and each auxiliary task as training data.
Figure 3: Statistic of the learned weighting coefficients $\lambda_1, \lambda_2$ and the bias term $b$ of OTCE under diverse transfer configurations.
Figure 4: A toy example shows that the F-OTCE metric fails to distinguish the more transferable source model, while the JC-OTCE predicts correctly by involving the label distance in computing the correspondences.
Figure 5: The pipeline of our OTCE-based finetune algorithm.
...and 7 more figures

Theorems & Definitions (1)

Definition 1

Transferability-Guided Cross-Domain Cross-Task Transfer Learning

TL;DR

Abstract

Transferability-Guided Cross-Domain Cross-Task Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (1)