Table of Contents
Fetching ...

TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression

Zelin He, Ying Sun, Jingyuan Liu, Runze Li

TL;DR

TransFusion tackles covariate-shift robust transfer learning for high-dimensional regression with a sparse target and diverse source tasks. It introduces a fused-regularizer-based two-step procedure (co-training and local debias) to leverage source data while remaining robust to covariate shifts, and a distributed variant (D-TransFusion) enabling one-shot communication. The authors establish nonasymptotic error bounds and conditions for minimax optimality, showing that and when information from source tasks improves the target estimation rate despite shifts, with D-TransFusion achieving near-centralized performance under sufficient source information. Empirical results on simulations and MNIST-C demonstrate robust covariate-shift handling, effectiveness of task diversity, and substantial communication savings in the distributed setting.

Abstract

The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a novel fused-regularizer that effectively leverages samples from source tasks to improve the learning performance on a target task with limited samples. Nonasymptotic bound is provided for the estimation error of the target model, showing the robustness of the proposed method to covariate shifts. We further establish conditions under which the estimator is minimax-optimal. Additionally, we extend the method to a distributed setting, allowing for a pretraining-finetuning strategy, requiring just one round of communication while retaining the estimation rate of the centralized version. Numerical tests validate our theory, highlighting the method's robustness to covariate shifts.

TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression

TL;DR

TransFusion tackles covariate-shift robust transfer learning for high-dimensional regression with a sparse target and diverse source tasks. It introduces a fused-regularizer-based two-step procedure (co-training and local debias) to leverage source data while remaining robust to covariate shifts, and a distributed variant (D-TransFusion) enabling one-shot communication. The authors establish nonasymptotic error bounds and conditions for minimax optimality, showing that and when information from source tasks improves the target estimation rate despite shifts, with D-TransFusion achieving near-centralized performance under sufficient source information. Empirical results on simulations and MNIST-C demonstrate robust covariate-shift handling, effectiveness of task diversity, and substantial communication savings in the distributed setting.

Abstract

The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a novel fused-regularizer that effectively leverages samples from source tasks to improve the learning performance on a target task with limited samples. Nonasymptotic bound is provided for the estimation error of the target model, showing the robustness of the proposed method to covariate shifts. We further establish conditions under which the estimator is minimax-optimal. Additionally, we extend the method to a distributed setting, allowing for a pretraining-finetuning strategy, requiring just one round of communication while retaining the estimation rate of the centralized version. Numerical tests validate our theory, highlighting the method's robustness to covariate shifts.
Paper Structure (27 sections, 15 theorems, 123 equations, 6 figures)

This paper contains 27 sections, 15 theorems, 123 equations, 6 figures.

Key Result

Theorem 1

Under Assumption A1 and A2, if $n_{S} \gg s \log p$, then by choosing $\lambda_{0}=c_{0}\sqrt{\log p / N}$ for some universal constant $c_{0}$ and $a_{k}=8 \sqrt{n_{S}/N}$, we have and with probability at least $1-c_{1}\exp(-c_2 n_{T}) -c_{3}\exp \left(-c_{4} \log p \right)$, where $v_{n} := \sqrt{K^2 \log p / n_{S}}\bar{h}$ and $\bar{h}:=\frac{n_{S}}{N}\sum^{K}_{k=1} h_k$.

Figures (6)

  • Figure 1: Comparison of estimation errors under (i) diverse and (ii) non-diverse source task settings with (a) homogeneous design and (b) heterogeneous design.
  • Figure 2: Comparison of the estimation errors of D-TransFusion and TransFusion methods under (i) diverse and (ii) non-diverse source task settings with (b) heterogeneous design.
  • Figure 3: Comparison of estimation errors under (i) diverse and (ii) non-diverse source task settings with (a) homogeneous design and (b) heterogeneous design with a large choice of $K$.
  • Figure 4: Comparison of estimation errors under (i) diverse source task settings with (a) homogeneous design and (b) heterogeneous design with different choice of $n_{S}$ and a fixed $K=10$.
  • Figure 5: The correlation heatmaps of flattened pixel features for handwritten digit images affected by different types of corruptions. From left to right, the images are subjected to brightness corruption, fog corruption, motion blur corruption, and the original images without corruption (identity corruption).
  • ...and 1 more figures

Theorems & Definitions (21)

  • Remark 1
  • Definition 1: Source task diversity
  • Theorem 1
  • Remark 2: Adaptive version of TransFusion
  • Remark 3: Scalability with task number $K$
  • Theorem 2
  • Corollary 1
  • Proposition 1
  • Remark 4: Implementation of TransFusion
  • Theorem 3
  • ...and 11 more