HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data
Jingyang Li, Zhongyuan Lyu
TL;DR
HeteroJIVE advances joint subspace estimation for heterogeneous multi-view data by introducing a weighted AJIVE-type framework that down-weights low-SNR views. It provides a rigorous theory separating statistical and structural heterogeneity, proving improved rates (including $O(K^{-1/2})$) under mild geometric conditions and offering an oracle-optimal and data-driven weighting scheme. Empirical studies demonstrate substantial practical gains over AJIVE and Stack-SVD, including a TCGA-BRCA multi-omics application with improved downstream clustering. The work thus delivers algorithmic, theoretical, and empirical advances for robust, scalable integration of heterogeneous multi-view data.
Abstract
Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing" error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.
