Table of Contents
Fetching ...

A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning

Qingyue Zhang, Haohao Fu, Guanbo Huang, Yaoyuan Liang, Chang Chu, Tianren Peng, Yanru Wu, Qi Li, Yang Li, Shao-Lun Huang

TL;DR

This work tackles how many samples to transfer from each source task in multi-source transfer learning by introducing a KL-divergence-based generalization error and analyzing it with high-dimensional statistics. It derives explicit optimal transfer-quantity formulas for single- and multi-source settings and presents OTQMS, an architecture-agnostic, data-efficient training algorithm with a dynamic sampling strategy guided by the Fisher information. Theoretical results are complemented by extensive experiments on DomainNet, Office-Home, and Digits, where OTQMS achieves higher accuracy and substantial data-time savings compared with baselines. The approach broadens the applicability of multi-source transfer by enabling shot-general, domain-aware transfer quantity optimization with practical, scalable training. Overall, OTQMS demonstrates that carefully selecting transfer quantities rather than exhaustively using all source data yields meaningful gains in both performance and efficiency in real-world, few-shot transfer learning scenarios.

Abstract

Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure based on K-L divergence, and minimize it based on high-dimensional statistical analysis to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for target model training in multi-source transfer learning. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code and supplementary materials are available in https://github.com/zqy0126/OTQMS.

A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning

TL;DR

This work tackles how many samples to transfer from each source task in multi-source transfer learning by introducing a KL-divergence-based generalization error and analyzing it with high-dimensional statistics. It derives explicit optimal transfer-quantity formulas for single- and multi-source settings and presents OTQMS, an architecture-agnostic, data-efficient training algorithm with a dynamic sampling strategy guided by the Fisher information. Theoretical results are complemented by extensive experiments on DomainNet, Office-Home, and Digits, where OTQMS achieves higher accuracy and substantial data-time savings compared with baselines. The approach broadens the applicability of multi-source transfer by enabling shot-general, domain-aware transfer quantity optimization with practical, scalable training. Overall, OTQMS demonstrates that carefully selecting transfer quantities rather than exhaustively using all source data yields meaningful gains in both performance and efficiency in real-world, few-shot transfer learning scenarios.

Abstract

Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure based on K-L divergence, and minimize it based on high-dimensional statistical analysis to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for target model training in multi-source transfer learning. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code and supplementary materials are available in https://github.com/zqy0126/OTQMS.

Paper Structure

This paper contains 37 sections, 9 theorems, 76 equations, 5 figures, 13 tables, 1 algorithm.

Key Result

Lemma 1

(Asymptotic Normality of the MLE) wasserman2013all When we use MLE only based on target task samples to estimate $\theta_0$, i.e. under appropriate regularity conditions, the following holds: where “${-1}$” denotes the matrix inverse and the $J({\underline{\theta}})$ is the Fisher information matrix defined as:

Figures (5)

  • Figure 1: More source samples does not always mean better performance. Incorporating all source samples may bring negative impact, which is illustrated by the comparison of two strategies, using target task samples with all source samples (blue) and using target task samples only (red), evaluated on the equally divided 5-task CIFAR10 dataset. Theoretically, although incorporating more source samples reduces model variance by expanding the training data, the discrepancy between the source and target tasks introduces additional bias.
  • Figure 2: The function curve figures of \ref{['thm:one_source_KL']} under different regimes determined by the value of $N_{0} \cdot t$ (blue). The vertical axis denotes the value of proposed measure \ref{['thm:one_source_KL']}, while the horizontal axis denotes the variable $n_1$.
  • Figure 3: Performance comparison with increasing target shots up to 100 per class on DomainNet dataset (I, P, Q and R domains). OTQMS (blue) outperforms other methods.
  • Figure 4: Data efficiency comparison of average sample usage and training time on DomainNet dataset, the left vertical axis represents the amount of sample usage, with green bars indicating AllSources $\cup$ Target data counts, blue bars about OTQMS, red bars about MADA(ViT-S) and azury bars about MADA(Res50), while the right orange vertical axis and lines represent training time.
  • Figure 5: Visualization of domain-specific transfer quantity under 10-shot setting. (a) Domain selection during training epochs (from left to right), where the blue upper part represents the selection of target domain Clipart on Office-Home, and the orange lower part represents the selection of target domain Sketch on DomainNet. Darker colors indicate stronger tendencies throughout the training process. (b) Source domain preferences of different target domains on DomainNet. Each row corresponds to a target domain while each column represents a source domain.

Theorems & Definitions (15)

  • Lemma 1
  • Definition 2
  • Lemma 3
  • Theorem 4
  • Proposition 5
  • Proposition 6
  • Theorem 7
  • Lemma 8
  • proof
  • Definition 9
  • ...and 5 more