Table of Contents
Fetching ...

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

Zelin He, Ying Sun, Jingyuan Liu, Runze Li

TL;DR

An adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures and achieves a convergence rate close to that of an oracle estimator with a known transferable structure.

Abstract

We consider the transfer learning problem in the high dimensional linear regression setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. We show that, with appropriately chosen weights, F-AdaTrans achieves a convergence rate close to that of an oracle estimator with a known transferable structure, and S-AdaTrans recovers existing near-minimax optimal rates as a special case. The effectiveness of the proposed method is validated using both simulation and real data, demonstrating favorable performance compared to the existing methods.

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

TL;DR

An adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures and achieves a convergence rate close to that of an oracle estimator with a known transferable structure.

Abstract

We consider the transfer learning problem in the high dimensional linear regression setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. We show that, with appropriately chosen weights, F-AdaTrans achieves a convergence rate close to that of an oracle estimator with a known transferable structure, and S-AdaTrans recovers existing near-minimax optimal rates as a special case. The effectiveness of the proposed method is validated using both simulation and real data, demonstrating favorable performance compared to the existing methods.
Paper Structure (45 sections, 14 theorems, 142 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 45 sections, 14 theorems, 142 equations, 7 figures, 3 tables, 2 algorithms.

Key Result

Theorem 3.1

Given Assumptions A1 and A2, and provided that $n_{S} \gtrsim \log p$, if we choose $w^{(k)}_{j} = \boldsymbol{1}_{\{j \in S^c_k\}}$ for $k = 0, \dots, K$, $\lambda_{0} \gtrsim \sqrt{\frac{\log p}{N}}$ and $\lambda_{1} \gtrsim \frac{n_S}{N} \sqrt{\frac{\log p}{n_S}}$, then with probability larger th

Figures (7)

  • Figure 1: (a) A special case of feature-specific transferable structure, where two sources possess non-overlapping non-transferable features. For each source, transferable features are shaded in orange, while non-transferable ones are left blank. (b) Sample-specific transferable structure. The darkness of orange depicts the transferability of each source. In both (a) and (b), truly sparse active target signals are highlighted in blue.
  • Figure 2: Penalty Functions. SCAD (blue) and Lasso (green) shown on the left; their first-order derivatives are on the right.
  • Figure 3: (a,b) the influence of varying $h_1$ and $h_2$ on the weights $w^{\prime}_{1}$ and $w^{\prime}_{2}$ that minimize the bound (\ref{['onestepFusion']}) under constraints ($K=2$, $p=500$, $n_T=50$, $n_S=250$, $s=8$). (c) the geometric interpretation when solving the problem with $h_1=0$ and $h_2=1$, showing how the ellipsoid level set hits the "edges" of the constraint and shrinks $w_2^\prime$ to zero.
  • Figure 4: (Left) Fitted coefficient values of the 146 active features in different tasks. (Right) Prediction error of different methods on the target tasks.
  • Figure 5: Average estimation error against $K$ and $s_k$ under under feature-wise (Setting 1) and sample-wise (Setting 2) adaptive transfer setting
  • ...and 2 more figures

Theorems & Definitions (14)

  • Theorem 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Theorem 3.4
  • Corollary 3.5
  • Theorem 4.1
  • Corollary 4.2
  • Lemma A.1
  • Lemma A.2
  • Lemma A.3: RSC and RSM property
  • ...and 4 more