Table of Contents
Fetching ...

Transfer learning via Regularized Linear Discriminant Analysis

Hongzhe Zhang, Arnab Auddy, Hongzhe Lee

TL;DR

This work addresses high-dimensional two-class discrimination with scarce target data by introducing transfer-learning regularized discriminant analysis (TL-RDA). It combines ridge discriminants from target and source populations via learned weights, with asymptotic guarantees derived under random-matrix theory in the proportional regime $p/n\to\gamma$. The authors characterize limiting classification errors, provide closed-form optimal weights for estimation and prediction risks, and offer geometric intuition and robustness results. Empirical assessment, including proteomics-based cardiovascular risk prediction, shows TL-RDA and its pooled variant often outperform target-only and pool-only approaches, demonstrating practical value for leveraging related datasets in high-dimensional biomedical problems.

Abstract

Linear discriminant analysis is a widely used method for classification. However, the high dimensionality of predictors combined with small sample sizes often results in large classification errors. To address this challenge, it is crucial to leverage data from related source models to enhance the classification performance of a target model. We propose to address this problem in the framework of transfer learning. In this paper, we present novel transfer learning methods via regularized random-effects linear discriminant analysis, where the discriminant direction is estimated as a weighted combination of ridge estimates obtained from both the target and source models. Multiple strategies for determining these weights are introduced and evaluated, including one that minimizes the estimation risk of the discriminant vector and another that minimizes the classification error. Utilizing results from random matrix theory, we explicitly derive the asymptotic values of these weights and the associated classification error rates in the high-dimensional setting, where $p/n \rightarrow γ$, with $p$ representing the predictor dimension and $n$ the sample size. We also provide geometric interpretations of various weights and a guidance on which weights to choose. Extensive numerical studies, including simulations and analysis of proteomics-based 10-year cardiovascular disease risk classification, demonstrate the effectiveness of the proposed approach.

Transfer learning via Regularized Linear Discriminant Analysis

TL;DR

This work addresses high-dimensional two-class discrimination with scarce target data by introducing transfer-learning regularized discriminant analysis (TL-RDA). It combines ridge discriminants from target and source populations via learned weights, with asymptotic guarantees derived under random-matrix theory in the proportional regime . The authors characterize limiting classification errors, provide closed-form optimal weights for estimation and prediction risks, and offer geometric intuition and robustness results. Empirical assessment, including proteomics-based cardiovascular risk prediction, shows TL-RDA and its pooled variant often outperform target-only and pool-only approaches, demonstrating practical value for leveraging related datasets in high-dimensional biomedical problems.

Abstract

Linear discriminant analysis is a widely used method for classification. However, the high dimensionality of predictors combined with small sample sizes often results in large classification errors. To address this challenge, it is crucial to leverage data from related source models to enhance the classification performance of a target model. We propose to address this problem in the framework of transfer learning. In this paper, we present novel transfer learning methods via regularized random-effects linear discriminant analysis, where the discriminant direction is estimated as a weighted combination of ridge estimates obtained from both the target and source models. Multiple strategies for determining these weights are introduced and evaluated, including one that minimizes the estimation risk of the discriminant vector and another that minimizes the classification error. Utilizing results from random matrix theory, we explicitly derive the asymptotic values of these weights and the associated classification error rates in the high-dimensional setting, where , with representing the predictor dimension and the sample size. We also provide geometric interpretations of various weights and a guidance on which weights to choose. Extensive numerical studies, including simulations and analysis of proteomics-based 10-year cardiovascular disease risk classification, demonstrate the effectiveness of the proposed approach.
Paper Structure (26 sections, 27 theorems, 207 equations, 7 figures, 1 table)

This paper contains 26 sections, 27 theorems, 207 equations, 7 figures, 1 table.

Key Result

Theorem 4.1

Suppose that assumptions as:TCG-as:CCW as well as as:moment and as:aniso hold. Then for a fixed $K\ge 2$, as $n_k, p \rightarrow \infty, p / n_k \rightarrow \gamma_k\in (0, \infty]$ for $1\le k\le K$, we have that the limiting form of $Err(\mathbf{w})$ for a given weight vector $\mathbf{w}\in \mathb where the elements of $\mathbf{u}\in \mathbb{R}^K$ and ${\cal A}\in \mathbb{R}^{K\times K}$ are as

Figures (7)

  • Figure 1: Geometric interpretations of optimal weights
  • Figure 2: TL-RDA with the optimal estimation weight outperforms reugular RDA and TL-RDA with the optimal prediction weight when the target distribution changes.
  • Figure 3: TLP-RDA outperforms TL-RDA when $\gamma$ is large under general set ups.
  • Figure 4: Individual sample covariance matrices
  • Figure 5: Pooled sample covariance matrices
  • ...and 2 more figures

Theorems & Definitions (48)

  • Theorem 4.1: Asymptotic Classification Error for TL-RDA
  • Remark 4.1
  • Corollary 4.2: Asymptotic Classification Error for TLP-RDA
  • Theorem 4.3: Asymptotic Estimation Error Minimization for TL-RDA
  • Corollary 4.4: Estimation Error for Homogeneous Sources
  • Remark 4.2: Estimating optimal weights
  • Proposition 4.5
  • Corollary 4.6: Asymptotic Estimation Error Minimization for TLP-RDA
  • Theorem 4.7: Asymptotic Prediction Error Minimization for TL-RDA
  • Proposition 4.8
  • ...and 38 more