Spectral decomposition-assisted multi-study factor analysis

Lorenzo Mauri; Niccolò Anceschi; David B. Dunson

Spectral decomposition-assisted multi-study factor analysis

Lorenzo Mauri, Niccolò Anceschi, David B. Dunson

TL;DR

BLAST tackles high-dimensional covariance estimation across multiple studies by decomposing cross-study variability into a shared low-rank component $\mathbf{\Lambda}\mathbf{\Lambda}^{\top}$, study-specific low-rank components $\mathbf{\Gamma}_s\mathbf{\Gamma}_s^{\top}$, and diagonal noise $\mathbf{\Sigma}$. It uses a spectral factorization step to identify the shared subspace, followed by surrogate Bayesian regression for fast, parallelizable inference of loadings and residuals, avoiding expensive MCMC. Theoretical guarantees include Procrustes consistency for latent factors, posterior contraction and a CLT/ Bernstein–von Mises for the low-rank components, with variance inflation ensuring valid coverage; the framework is robust to heteroscedasticity and can handle moment-based assumptions. Empirically, BLAST demonstrates competitive accuracy and well-calibrated uncertainty in simulations and a gene-expression integration application, offering substantial computational speedups and scalability for large omics datasets.

Abstract

This article focuses on covariance estimation for multi-study data. Popular approaches employ factor-analytic terms with shared and study-specific loadings that decompose the variance into (i) a shared low-rank component, (ii) study-specific low-rank components, and (iii) a diagonal term capturing idiosyncratic variability. Our proposed methodology estimates the latent factors via spectral decompositions, with a novel approach for separating shared and specific factors, and infers the factor loadings and residual variances via surrogate Bayesian regressions. The resulting posterior has a simple product form across outcomes, bypassing the need for Markov chain Monte Carlo sampling and facilitating parallelization. The proposed methodology has major advantages over current Bayesian competitors in terms of computational speed, scalability and stability while also having strong frequentist guarantees. The theory and methods also add to the rich literature on frequentist methods for factor models with shared and group-specific components of variation. The approximation error decreases as the sample size and the data dimension diverge, formalizing a blessing of dimensionality. We show favorable asymptotic properties, including central limit theorems for point estimators and posterior contraction, and excellent empirical performance in simulations. The methods are applied to integrate three studies on gene associations among immune cells.

Spectral decomposition-assisted multi-study factor analysis

TL;DR

Abstract

Spectral decomposition-assisted multi-study factor analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (57)