Table of Contents
Fetching ...

Transfer Learning with Large-Scale Quantile Regression

Jun Jin, Jun Yan, Robert H. Aseltine, Kun Chen

TL;DR

This work develops transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and using them to improve the target model and shows the benefits and insights gained from transfer learning of three different types of airplanes.

Abstract

Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.

Transfer Learning with Large-Scale Quantile Regression

TL;DR

This work develops transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and using them to improve the target model and shows the benefits and insights gained from transfer learning of three different types of airplanes.

Abstract

Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.
Paper Structure (17 sections, 8 theorems, 105 equations, 15 figures)

This paper contains 17 sections, 8 theorems, 105 equations, 15 figures.

Key Result

Theorem 4.1

Suppose Assumptions assum:A1--assum:A6 hold true. We take ${\lambda_1} \asymp \sqrt {\log ( {p \vee {\overline{n}_{{\cal A}}}} )/( {{n_{\cal A}} + {n_0}} )}$ and $\lambda_0 \asymp \lambda _2 \asymp \sqrt {\log ( {p \vee {n_0}} )/{n_0}}$, then with we have for some positive constants $C_1$,…, $C_5$, where ${\underline{s}_{{\cal A}} } = {\min _{k \in {{\cal A} \cup \{ 0 \}}}}{s_k}$.

Figures (15)

  • Figure 1: Schematic of the procedures in transfer learning quantile regression.
  • Figure 2: Schematic of the procedures in informative set detection.
  • Figure 3: Average quantile loss among different $d$ with $\eta=20$ in homogeneous setting for normal and Cauchy error.
  • Figure 4: Average MSE on $\boldsymbol{\beta}$ among different $d$ with $\eta=20$ in homogeneous setting for normal and Cauchy error.
  • Figure 5: Average running time comparison under $n = 10000$ with different $p$
  • ...and 10 more figures

Theorems & Definitions (12)

  • Theorem 4.1: Convergence rate of the oracle estimator
  • Theorem 4.2: Consistency of $\widehat{\cal A}$
  • Lemma A.1: Proposition 11 in chen2020distributed
  • Remark
  • Lemma A.2: Theorem 1 in raskutti2010restricted
  • Lemma A.3: Bernstein's inequality (Theorem 1.13 of rigollet2015high)
  • Lemma A.4
  • proof : Proof of Lemma \ref{['lem-4']}
  • Lemma A.5
  • proof : Proof of Lemma \ref{['lem-5']}
  • ...and 2 more