Transferability-Guided Cross-Domain Cross-Task Transfer Learning
Yang Tan, Enming Zhang, Yang Li, Shao-Lun Huang, Xiao-Ping Zhang
TL;DR
This work tackles the problem of estimating transferability in cross-domain cross-task learning without relying on costly auxiliary tasks. It introduces two auxiliary-free metrics, F-OTCE and JC-OTCE, built on Optimal Transport to quantify transferability via a coupling between source and target data and the resulting negative conditional entropy; JC-OTCE further incorporates label-distance information into the ground cost. The authors demonstrate that these metrics outperform existing auxiliary-free approaches in correlation with ground-truth transfer accuracy while reducing computation time from minutes to seconds, with JC-OTCE approaching the accuracy of the auxiliary-based OTCE. They further show practical utility by using F-OTCE as an objective to guide OTCE-based finetuning and by integrating it into a domain-generalization framework (replacing distillation with F-OTCE), yielding improved few-shot performance across domains and tasks. The proposed framework offers scalable transferability estimation and transferable representation learning, enabling more efficient source-model selection, multi-source fusion, and cross-domain adaptation.
Abstract
We propose two novel transferability metrics F-OTCE (Fast Optimal Transport based Conditional Entropy) and JC-OTCE (Joint Correspondence OTCE) to evaluate how much the source model (task) can benefit the learning of the target task and to learn more transferable representations for cross-domain cross-task transfer learning. Unlike the existing metric that requires evaluating the empirical transferability on auxiliary tasks, our metrics are auxiliary-free such that they can be computed much more efficiently. Specifically, F-OTCE estimates transferability by first solving an Optimal Transport (OT) problem between source and target distributions, and then uses the optimal coupling to compute the Negative Conditional Entropy between source and target labels. It can also serve as a loss function to maximize the transferability of the source model before finetuning on the target task. Meanwhile, JC-OTCE improves the transferability robustness of F-OTCE by including label distances in the OT problem, though it may incur additional computation cost. Extensive experiments demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free metrics by 18.85% and 28.88%, respectively in correlation coefficient with the ground-truth transfer accuracy. By eliminating the training cost of auxiliary tasks, the two metrics reduces the total computation time of the previous method from 43 minutes to 9.32s and 10.78s, respectively, for a pair of tasks. When used as a loss function, F-OTCE shows consistent improvements on the transfer accuracy of the source model in few-shot classification experiments, with up to 4.41% accuracy gain.
