AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

Zelin He; Ying Sun; Jingyuan Liu; Runze Li

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

Zelin He, Ying Sun, Jingyuan Liu, Runze Li

TL;DR

An adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures and achieves a convergence rate close to that of an oracle estimator with a known transferable structure.

Abstract

We consider the transfer learning problem in the high dimensional linear regression setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. We show that, with appropriately chosen weights, F-AdaTrans achieves a convergence rate close to that of an oracle estimator with a known transferable structure, and S-AdaTrans recovers existing near-minimax optimal rates as a special case. The effectiveness of the proposed method is validated using both simulation and real data, demonstrating favorable performance compared to the existing methods.

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

TL;DR

Abstract

Paper Structure (45 sections, 14 theorems, 142 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 45 sections, 14 theorems, 142 equations, 7 figures, 3 tables, 2 algorithms.

Introduction
Related Works
Notation and Organizaiton
Preliminaries
Feature-wise Adaptive Transfer Learning
Weight Choice with Known Transferable Structure
Weight Choice with Unknown Transferable Structure
Sample-wise Adaptive Transfer Learning
Weight Choice with Known Informative Level
Weight Choice with Unknown Informative Level
Emperical Experiments
Simulation Study
Real Data Analysis
Conclusion
Proof of Theorems, Propositions, and Corollaries
...and 30 more sections

Key Result

Theorem 3.1

Given Assumptions A1 and A2, and provided that $n_{S} \gtrsim \log p$, if we choose $w^{(k)}_{j} = \boldsymbol{1}_{\{j \in S^c_k\}}$ for $k = 0, \dots, K$, $\lambda_{0} \gtrsim \sqrt{\frac{\log p}{N}}$ and $\lambda_{1} \gtrsim \frac{n_S}{N} \sqrt{\frac{\log p}{n_S}}$, then with probability larger th

Figures (7)

Figure 1: (a) A special case of feature-specific transferable structure, where two sources possess non-overlapping non-transferable features. For each source, transferable features are shaded in orange, while non-transferable ones are left blank. (b) Sample-specific transferable structure. The darkness of orange depicts the transferability of each source. In both (a) and (b), truly sparse active target signals are highlighted in blue.
Figure 2: Penalty Functions. SCAD (blue) and Lasso (green) shown on the left; their first-order derivatives are on the right.
Figure 3: (a,b) the influence of varying $h_1$ and $h_2$ on the weights $w^{\prime}_{1}$ and $w^{\prime}_{2}$ that minimize the bound (\ref{['onestepFusion']}) under constraints ($K=2$, $p=500$, $n_T=50$, $n_S=250$, $s=8$). (c) the geometric interpretation when solving the problem with $h_1=0$ and $h_2=1$, showing how the ellipsoid level set hits the "edges" of the constraint and shrinks $w_2^\prime$ to zero.
Figure 4: (Left) Fitted coefficient values of the 146 active features in different tasks. (Right) Prediction error of different methods on the target tasks.
Figure 5: Average estimation error against $K$ and $s_k$ under under feature-wise (Setting 1) and sample-wise (Setting 2) adaptive transfer setting
...and 2 more figures

Theorems & Definitions (14)

Theorem 3.1
Proposition 3.2
Proposition 3.3
Theorem 3.4
Corollary 3.5
Theorem 4.1
Corollary 4.2
Lemma A.1
Lemma A.2
Lemma A.3: RSC and RSM property
...and 4 more

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

TL;DR

Abstract

AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)