Table of Contents
Fetching ...

D-CDLF: Decomposition of Common and Distinctive Latent Factors for Multi-view High-dimensional Data

Hai Shu

TL;DR

The paper addresses joint analysis of two-view high-dimensional data by decomposing each view into a common latent source and view-specific distinctive sources while enforcing uncorrelatedness among all latent factors. It introduces D-CDLF, a decomposition that achieves tri-orthogonality of common and distinctive latent factors across views, and defines variable- and view-level PVEs to quantify explained variance. The estimation framework handles high dimensionality via a spiked-covariance model, soft-thresholded SVD, and information-theoretic rank selection, culminating in estimators for the common and distinctive source covariances and their predictive components. Theoretical results guarantee the (semi-)uniqueness of the covariances associated with the common and distinctive sources, ensuring robust interpretability of the latent factor spaces under the proposed decomposition.

Abstract

A typical approach to the joint analysis of multiple high-dimensional data views is to decompose each view's data matrix into three parts: a low-rank common-source matrix generated by common latent factors of all data views, a low-rank distinctive-source matrix generated by distinctive latent factors of the corresponding data view, and an additive noise matrix. Existing decomposition methods often focus on the uncorrelatedness between the common latent factors and distinctive latent factors, but inadequately address the equally necessary uncorrelatedness between distinctive latent factors from different data views. We propose a novel decomposition method, called Decomposition of Common and Distinctive Latent Factors (D-CDLF), to effectively achieve both types of uncorrelatedness for two-view data. We also discuss the estimation of the D-CDLF under high-dimensional settings.

D-CDLF: Decomposition of Common and Distinctive Latent Factors for Multi-view High-dimensional Data

TL;DR

The paper addresses joint analysis of two-view high-dimensional data by decomposing each view into a common latent source and view-specific distinctive sources while enforcing uncorrelatedness among all latent factors. It introduces D-CDLF, a decomposition that achieves tri-orthogonality of common and distinctive latent factors across views, and defines variable- and view-level PVEs to quantify explained variance. The estimation framework handles high dimensionality via a spiked-covariance model, soft-thresholded SVD, and information-theoretic rank selection, culminating in estimators for the common and distinctive source covariances and their predictive components. Theoretical results guarantee the (semi-)uniqueness of the covariances associated with the common and distinctive sources, ensuring robust interpretability of the latent factor spaces under the proposed decomposition.

Abstract

A typical approach to the joint analysis of multiple high-dimensional data views is to decompose each view's data matrix into three parts: a low-rank common-source matrix generated by common latent factors of all data views, a low-rank distinctive-source matrix generated by distinctive latent factors of the corresponding data view, and an additive noise matrix. Existing decomposition methods often focus on the uncorrelatedness between the common latent factors and distinctive latent factors, but inadequately address the equally necessary uncorrelatedness between distinctive latent factors from different data views. We propose a novel decomposition method, called Decomposition of Common and Distinctive Latent Factors (D-CDLF), to effectively achieve both types of uncorrelatedness for two-view data. We also discuss the estimation of the D-CDLF under high-dimensional settings.
Paper Structure (10 sections, 2 theorems, 36 equations, 1 figure)

This paper contains 10 sections, 2 theorems, 36 equations, 1 figure.

Key Result

Proposition 1

Let $z_1,z_2,z_{\mathop{\mathrm{\mathfrak{I}}}\nolimits}\in (\mathcal{L}_0^2,\mathop{\mathrm{cov}}\nolimits)$ be three standardized random variables with $\mathop{\mathrm{corr}}\nolimits(z_1,z_2)=\rho\in [0,1]$ and $z_{\mathop{\mathrm{\mathfrak{I}}}\nolimits}\perp \{z_1,z_2\}$. Decomposition decomp

Figures (1)

  • Figure 1: The D-CDLF decomposition for two standardized real random variables $z_1$ and $z_2$ with different values of their correlation $\rho$ and angle $\theta=\arccos \rho$.

Theorems & Definitions (7)

  • Proposition 1
  • Remark 1
  • Theorem 1: Uniqueness
  • Remark 2
  • Remark 3
  • proof : Proof of Proposition \ref{['prop: uniqueness of c']}
  • proof : Proof of Theorem \ref{['thm: Uniqueness of C for two blocks']}