Table of Contents
Fetching ...

Heterogeneous transfer learning for high-dimensional regression with feature mismatch

Jae Ho Chang, Massimiliano Russo, Subhadeep Paul

TL;DR

HTL with feature mismatch addresses high-dimensional regression when target data lack some covariates that are available in a rich proxy dataset. The method learns a feature map from the proxy (linear or nonparametric via sieve) to impute missing target features, then performs a two-stage penalized regression using both matched and imputed features. Nonasymptotic upper bounds are derived for estimation and prediction errors, detailing dependence on proxy-target quality, map discrepancy, and sample sizes; results extend to multiple proxy domains. Simulations and an ovarian cancer gene-expression case study demonstrate that HTL-impute outperforms homogeneous TL and target-only approaches, offering improved prediction and more reliable inference in settings with feature mismatch and data-poor targets.

Abstract

We consider Heterogeneous Transfer Learning (HTL) from a source to a new target domain for high-dimensional regression with differing feature sets. Most homogeneous TL methods assume that target and source domains share the same feature space, which limits their practical applicability. In applications, the target and source features are frequently different due to the inability to measure certain variables in data-poor target environments. Conversely, existing HTL methods do not provide statistical error guarantees, limiting their utility for scientific discovery. Our method first learns a feature map between the missing and observed features, leveraging the vast source data, and then imputes the missing features in the target. Using the combined matched and imputed features, we then perform a two-step transfer learning for penalized regression. We develop upper bounds on estimation and prediction errors, assuming that the source and target parameters differ sparsely but without assuming sparsity in the target model. We obtain results for both when the feature map is linear and when it is nonparametrically specified as unknown functions. Our results elucidate how estimation and prediction errors of HTL depend on the model's complexity, sample size, the quality and differences in feature maps, and differences in the models across domains.

Heterogeneous transfer learning for high-dimensional regression with feature mismatch

TL;DR

HTL with feature mismatch addresses high-dimensional regression when target data lack some covariates that are available in a rich proxy dataset. The method learns a feature map from the proxy (linear or nonparametric via sieve) to impute missing target features, then performs a two-stage penalized regression using both matched and imputed features. Nonasymptotic upper bounds are derived for estimation and prediction errors, detailing dependence on proxy-target quality, map discrepancy, and sample sizes; results extend to multiple proxy domains. Simulations and an ovarian cancer gene-expression case study demonstrate that HTL-impute outperforms homogeneous TL and target-only approaches, offering improved prediction and more reliable inference in settings with feature mismatch and data-poor targets.

Abstract

We consider Heterogeneous Transfer Learning (HTL) from a source to a new target domain for high-dimensional regression with differing feature sets. Most homogeneous TL methods assume that target and source domains share the same feature space, which limits their practical applicability. In applications, the target and source features are frequently different due to the inability to measure certain variables in data-poor target environments. Conversely, existing HTL methods do not provide statistical error guarantees, limiting their utility for scientific discovery. Our method first learns a feature map between the missing and observed features, leveraging the vast source data, and then imputes the missing features in the target. Using the combined matched and imputed features, we then perform a two-step transfer learning for penalized regression. We develop upper bounds on estimation and prediction errors, assuming that the source and target parameters differ sparsely but without assuming sparsity in the target model. We obtain results for both when the feature map is linear and when it is nonparametrically specified as unknown functions. Our results elucidate how estimation and prediction errors of HTL depend on the model's complexity, sample size, the quality and differences in feature maps, and differences in the models across domains.

Paper Structure

This paper contains 42 sections, 27 theorems, 190 equations, 10 figures.

Key Result

Theorem 3.1

Assume that $\sigma_{\xi\mathfrak{p}}+\sigma_{\xi\mathfrak{t}}=\mathcal{O}(1)$, $\log p\le{n_\mathfrak{t}}\lesssim n_\mathfrak{p}$, ${p_{1}}\asymp{p_{2}}$, and $n_\mathfrak{p}\gg(\rho_\mathfrak{p} p\log p)^2$. Suppose that Then, with probability at least $1-c\exp(-c'\log p)$, Further assume that we observe $n_0<\infty$ new observations from the target domain, i.e., $\mathbf{X}_\mathfrak{t}^0$. T

Figures (10)

  • Figure 1: MAPs of considered methodologies with sparse $\delta^*$ (is.$\delta$.sparse:Y) and non-sparse $\delta^*$ (is.$\delta$.sparse:N). In the left figure $n_\mathfrak{t}$ is increased keeping $K,n_\mathfrak{p},n_{test},p_1,p_2$ fixed, while in the right figure $K$ is increased keeping $n_\mathfrak{p},n_\mathfrak{t},n_{test},p_1,p_2$ fixed.
  • Figure 2: Prediction errors of considered methodologies with sparse $\delta^*$ and non-sparse $\delta^*$ with (left) increasing $p_1$ and (right) increasing $p_2$, keeping other quantities fixed.
  • Figure 3: Estimation error of considered methodologies with sparse $\delta^*$ and non-sparse $\delta^*$, with increasing $n_\mathfrak{t}, K,p_1,p_2$ respectively. The boxplots of the errors for TL-impute are close to the Oracle and appear as horizontal lines close to 0 due to the scale of the figure.
  • Figure 4: Prediction errors of considered methodologies for non-linear feature map in sparse $\delta^*$ and non-sparse $\delta^*$ cases, with increasing $n_\mathfrak{t}$ and increasing $K$ respectively.
  • Figure 5: Prediction error of considered methodologies for non-linear feature map with increasing $p_1$ and $p_2$ respectively.
  • ...and 5 more figures

Theorems & Definitions (39)

  • Definition 1
  • Remark 1
  • Theorem 3.1
  • Corollary 3.1
  • Theorem 3.2
  • Corollary 3.2
  • Theorem 3.3
  • Corollary 3.3
  • Theorem 3.4
  • Corollary 3.4
  • ...and 29 more