Table of Contents
Fetching ...

HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data

Jingyang Li, Zhongyuan Lyu

TL;DR

HeteroJIVE advances joint subspace estimation for heterogeneous multi-view data by introducing a weighted AJIVE-type framework that down-weights low-SNR views. It provides a rigorous theory separating statistical and structural heterogeneity, proving improved rates (including $O(K^{-1/2})$) under mild geometric conditions and offering an oracle-optimal and data-driven weighting scheme. Empirical studies demonstrate substantial practical gains over AJIVE and Stack-SVD, including a TCGA-BRCA multi-omics application with improved downstream clustering. The work thus delivers algorithmic, theoretical, and empirical advances for robust, scalable integration of heterogeneous multi-view data.

Abstract

Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing" error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.

HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data

TL;DR

HeteroJIVE advances joint subspace estimation for heterogeneous multi-view data by introducing a weighted AJIVE-type framework that down-weights low-SNR views. It provides a rigorous theory separating statistical and structural heterogeneity, proving improved rates (including ) under mild geometric conditions and offering an oracle-optimal and data-driven weighting scheme. Empirical studies demonstrate substantial practical gains over AJIVE and Stack-SVD, including a TCGA-BRCA multi-omics application with improved downstream clustering. The work thus delivers algorithmic, theoretical, and empirical advances for robust, scalable integration of heterogeneous multi-view data.

Abstract

Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing" error barrier with respect to the number of views identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.

Paper Structure

This paper contains 38 sections, 22 theorems, 169 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Assume for some universal constant $c>0$. Then we have with probability exceeding $1-O(Ke^{-n\wedge d})$, the output of Algorithm alg:hjive with the choice $w_k = 1/K$ for $k\in[K]$ satisfies where $r_{\textsf{avg}}:= K^{-1}\sum_{k=1}^{K} r_k$.

Figures (4)

  • Figure 1: Estimation error of $\|\widehat{{\boldsymbol U}}\widehat{{\boldsymbol U}}^\top-{\boldsymbol U}{\boldsymbol U}^\top\|$ across methods for \ref{['exmp:intro']} averaged over $100$ replicates. The average weights for HeteroJIVE are given by $(0.011,0.989)$.
  • Figure 2: (Performance of AJIVE) Estimation error of $\|\widehat{{\boldsymbol U}}\widehat{{\boldsymbol U}}^\top-{\boldsymbol U}{\boldsymbol U}^\top\|$ versus $\sqrt{K}$. Comparison across different loading schemes under the homogeneous setting (left) with $\sigma=0.1$, $\theta=0.5$ and $\gamma=0.5$. Comparison with Stack-SVD under the homogeneous setting and random loading scheme (right) with $\sigma=0.1$, $\theta=0.5$ and $\gamma=1.5$.
  • Figure 3: (Comparison across methods under the heterogeneous setting) Estimation error of $\|\widehat{{\boldsymbol U}}\widehat{{\boldsymbol U}}^\top-{\boldsymbol U}{\boldsymbol U}^\top\|$ versus $\sqrt{K}$ with (left) random loading scheme and (right) shared-orthogonal loading scheme. $\sigma_k\overset{i.i.d.}{\sim}\text{Unif}(0.1,2)$, $\theta=0.6$ and $\gamma=2$
  • Figure 4: The scatter plots are color-coded by predefined subtypes through consensus clustering analysis in cancer2012comprehensive.

Theorems & Definitions (30)

  • Example 1
  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Example 2
  • Theorem 3
  • Theorem 4
  • Proposition 2
  • Theorem 5
  • Lemma 1
  • ...and 20 more