Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices
Yuepeng Yang, Cong Ma
TL;DR
The paper analyzes shared-subspace estimation under the JIVE/AJIVE framework when combining $K$ data matrices. It establishes that AJIVE achieves first-order optimality in high-SNR settings, with estimation error decaying as $1/\, oot 2 extof{K}$, and derives minimax lower bounds that confirm this behavior. In低-SNR regimes, AJIVE exhibits a non-diminishing second-order error that does not vanish with increasing $K$, indicating a fundamental limitation. An oracle-aided spectral estimator is introduced to probe the low-SNR barrier, and its analysis shows even with ideal knowledge of the unique components, the error cannot vanish as $K$ grows. Comprehensive numerical experiments corroborate the theory and illuminate the interplay among SNR, the number of matrices, and subspace misalignment, offering guidance for multi-matrix integration in practice.
Abstract
Integrative data analysis often requires disentangling joint and individual variations across multiple datasets, a challenge commonly addressed by the Joint and Individual Variation Explained (JIVE) model. While numerous methods have been developed to estimate the shared subspace under JIVE, the theoretical understanding of their performance remains limited, particularly in the context of multiple matrices and varying degrees of subspace misalignment. This paper bridges this gap by providing a systematic analysis of shared subspace estimation in multi-matrix settings. We focus on the Angle-based Joint and Individual Variation Explained (AJIVE) method, a two-stage spectral approach, and establish new performance guarantees that uncover its strengths and limitations. Specifically, we show that in high signal-to-noise ratio (SNR) regimes, AJIVE's estimation error decreases with the number of matrices, demonstrating the power of multi-matrix integration. Conversely, in low-SNR settings, AJIVE exhibits a non-diminishing error, highlighting fundamental limitations. To complement these results, we derive minimax lower bounds, showing that AJIVE achieves optimal rates in high-SNR regimes. Furthermore, we analyze an oracle-aided spectral estimator to demonstrate that the non-diminishing error in low-SNR scenarios is a fundamental barrier. Extensive numerical experiments corroborate our theoretical findings, providing insights into the interplay between SNR, the number of matrices, and subspace misalignment.
