Table of Contents
Fetching ...

Bayesian Transfer Learning for High-Dimensional Linear Regression via Adaptive Shrinkage

Parsa Jamshidian, Donatello Telesca

TL;DR

BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression is introduced and shows how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer.

Abstract

We introduce BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression. The proposed analytical framework leverages global-local shrinkage priors together with Bayesian source selection to balance information sharing and regularization. We show how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer. In this framework, both source selection and sparse regression are jointly accounted for in prediction and inference via Bayesian model averaging. The structure of our model admits efficient posterior simulation via a Metropolis-within-Gibbs sampling algorithm allowing full posterior inference for the target regression coefficients, making BLAST both computationally practical and inferentially straightforward. Our method achieves more accurate posterior inference for the target than regularization approaches based on target data alone, while offering competitive predictive performance and superior uncertainty quantification compared to current state-of-the-art transfer learning methods. We validate its effectiveness through extensive simulation studies and illustrate its analytical properties when applied to a case study on the estimation of tumor mutational burden from gene expression, using data from The Cancer Genome Atlas (TCGA).

Bayesian Transfer Learning for High-Dimensional Linear Regression via Adaptive Shrinkage

TL;DR

BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression is introduced and shows how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer.

Abstract

We introduce BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression. The proposed analytical framework leverages global-local shrinkage priors together with Bayesian source selection to balance information sharing and regularization. We show how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer. In this framework, both source selection and sparse regression are jointly accounted for in prediction and inference via Bayesian model averaging. The structure of our model admits efficient posterior simulation via a Metropolis-within-Gibbs sampling algorithm allowing full posterior inference for the target regression coefficients, making BLAST both computationally practical and inferentially straightforward. Our method achieves more accurate posterior inference for the target than regularization approaches based on target data alone, while offering competitive predictive performance and superior uncertainty quantification compared to current state-of-the-art transfer learning methods. We validate its effectiveness through extensive simulation studies and illustrate its analytical properties when applied to a case study on the estimation of tumor mutational burden from gene expression, using data from The Cancer Genome Atlas (TCGA).

Paper Structure

This paper contains 21 sections, 4 theorems, 39 equations, 5 figures, 2 algorithms.

Key Result

Theorem 3.1

Let $\bm w^\star$ denote the true anchoring coefficients with sparsity $s_w := \|\bm w^\star\|_0$, and define $n_w := n_0 + n_{|\mathcal{A}|}$ with pooled design $\mathbf X_w := [\mathbf X^{(0)\top},\,\mathbf X^{(\mathcal{A})\top}]^\top$. Under the regularity conditions stated above, the posterior d and where the contraction rate is

Figures (5)

  • Figure 1: Estimation and prediction errors for various transfer learning methods with different settings of $h$ for $K = 10$. $n_k = 150$ for $k = 0,\ldots, K$, $p = 200$ and $s = 6$. The x-axis denotes the number of informative source studies $|\mathcal{A}|$. Each point represents an average over 50 simulation replicates.
  • Figure 2: Posterior inclusion probabilities for each auxiliary study under varying informative set sizes. Each row corresponds to a different number of truly informative source studies (from $1$ to $5$), with the informative studies always assigned to the first $|\mathcal{A}|$ positions. Cells corresponding to the true informative studies are highlighted with bold white text. Prior inclusion probabilities were set to 0.5.
  • Figure 3: Average confidence/credible interval (CI) length (top panel) and average coverage probability (bottom panel) across varying numbers of source studies (1–10) for signal and non-signal parameters with $p = 300$ parameters and $s = 10$ signals. Results are shown for three methods: $\mathcal{A}_h$-Trans-GLM (blue), Desparsified-Lasso (purple), and Oracle BLAST (red). The dashed horizontal line in the coverage plots indicates the nominal 95% coverage level. Each point represents an average over 50 simulation replicates.
  • Figure 4: Average credible interval (CI) length (top panel) and average coverage probability (bottom panel) across varying numbers of informative source studies $|\mathcal{A}|$ (1–10), for signal and non-signal parameters using the BLAST method with source selection. The dashed horizontal line in the coverage plots denotes the nominal 95% coverage level. Each point represents an average over 50 simulation replicates.
  • Figure 5: (Top panel) Cross-validated relative prediction error for TMB predicted using 303 genes from the FoundationOne Gene Panel. Results are shown for various cancer targets (LUAD, KIRC, LUSC) and TL methods. (Bottom panel) Heatmap of posterior inclusion probabilities from BLAST selection for different target cancers.

Theorems & Definitions (4)

  • Theorem 3.1: Posterior contraction for $\bm w$ under oracle $\mathcal{A}$ - known contrasts $\boldsymbol{\delta}$
  • Theorem 3.2: Posterior contraction for $\bm\delta$ under oracle $\mathcal{A}$ - known anchoring signals ${\bm w}^{(\mathcal{A})}$
  • Theorem 3.3: Joint posterior contraction for $(\bm w,\bm\delta)$ under oracle $\mathcal{A}$
  • Theorem 3.4: Bayes factor consistency for general source configurations