Table of Contents
Fetching ...

Transfer Learning for High-dimensional Quantile Regression with Distribution Shift

Ruiqi Bai, Yijiao Zhang, Hanbo Yang, Zhongyi Zhu

TL;DR

This work addresses transferring knowledge to high-dimensional quantile regression in the presence of distribution shift across multiple sources. It introduces a transferable set that jointly accounts for parameter and residual shift, and develops a constrained L1-minimization framework to estimate the target quantile coefficients and source contrasts while mitigating covariate shift. A Neyman orthogonality–based debiased inference procedure leverages informative sources to achieve sqrt($n_{\mathcal C}$)–normality, with variance that reflects residual shift via the objective quantile density. Theoretical non-asymptotic error bounds, a detection-consistent transferable-set screening method, and empirical evidence from simulations and GTEx data demonstrate improved prediction accuracy and sharper inference while avoiding negative transfer under distribution shift.

Abstract

Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift: parameter shift, covariate shift, and residual shift. We propose a novel transferable set and a new transfer framework to address the above three discrepancies. Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method in the presence of distribution shift. Additionally, an orthogonal debiased approach is proposed for statistical inference with knowledge transfer, leading to sharper asymptotic results. Extensive simulation results as well as real data applications further demonstrate the effectiveness of our proposed procedure.

Transfer Learning for High-dimensional Quantile Regression with Distribution Shift

TL;DR

This work addresses transferring knowledge to high-dimensional quantile regression in the presence of distribution shift across multiple sources. It introduces a transferable set that jointly accounts for parameter and residual shift, and develops a constrained L1-minimization framework to estimate the target quantile coefficients and source contrasts while mitigating covariate shift. A Neyman orthogonality–based debiased inference procedure leverages informative sources to achieve sqrt()–normality, with variance that reflects residual shift via the objective quantile density. Theoretical non-asymptotic error bounds, a detection-consistent transferable-set screening method, and empirical evidence from simulations and GTEx data demonstrate improved prediction accuracy and sharper inference while avoiding negative transfer under distribution shift.

Abstract

Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift: parameter shift, covariate shift, and residual shift. We propose a novel transferable set and a new transfer framework to address the above three discrepancies. Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method in the presence of distribution shift. Additionally, an orthogonal debiased approach is proposed for statistical inference with knowledge transfer, leading to sharper asymptotic results. Extensive simulation results as well as real data applications further demonstrate the effectiveness of our proposed procedure.

Paper Structure

This paper contains 21 sections, 6 theorems, 36 equations, 18 figures, 5 tables, 4 algorithms.

Key Result

Theorem 1

(Convergence rate for $\widehat{\boldsymbol \beta}$ with known transferable set $\mathcal{C}$) Assume Conditions cond:restricted eigenvalue-cond:density are satisfied. Define $n_{\mathcal{C}}=n_0 + \sum_{k \in \mathcal{C}}n_k$. Let $\widehat{\boldsymbol \beta}$ be obtained from the transfer framewor holds for each source study $k \in \mathcal{C}$, then for any constant $\varepsilon>0$, with probab

Figures (18)

  • Figure 1: Average $\ell_2$-errors at $0.2$-th quantile, where source residual distributions are: 1) Normal: $\mathcal{N}(0,1)$; 2) Cauchy: $\mathcal{C}(0,5)$; 3) Mixed: mixed $\mathcal{N}(-3,0.5)$ and $\mathcal{N}(3,0.5)$; 4) Noisy: $\mathcal{N}(0,5^2)$.
  • Figure 2: Average $\ell_2$-error of various methods under homoscedastic model.
  • Figure 3: Inference performance of various methods, including boxplots of bias, boxplots of confidence interval lengths, and density plots of normalized estimates compared to $\mathcal{N}(0,1)$ with black dotted line.
  • Figure 4: Relative prediction errors for the gene expression levels of JAM2 and SH2D2A at $\tau=0.2$.
  • Figure 5: Average $\ell_2$-error of various methods under heteroscedastic model with $K=5$.
  • ...and 13 more figures

Theorems & Definitions (13)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Remark 3
  • Corollary 1
  • Remark 4
  • Theorem 2
  • Remark 5
  • Theorem 3
  • Remark 6
  • ...and 3 more