Table of Contents
Fetching ...

Multi-layer matrix factorization for cancer subtyping using full and partial multi-omics dataset

Yingxuan Ren, Fengtao Ren, Bo Yang

TL;DR

Multi-Layer Matrix Factorization (MLMF), a novel approach for cancer subtyping that employs multi-omics data clustering that incorporates a class indicator matrix to handle missing omics data, creating a unified framework that can manage both complete and incomplete multi-omics data.

Abstract

Cancer, with its inherent heterogeneity, is commonly categorized into distinct subtypes based on unique traits, cellular origins, and molecular markers specific to each type. However, current studies primarily rely on complete multi-omics datasets for predicting cancer subtypes, often overlooking predictive performance in cases where some omics data may be missing and neglecting implicit relationships across multiple layers of omics data integration. This paper introduces Multi-Layer Matrix Factorization (MLMF), a novel approach for cancer subtyping that employs multi-omics data clustering. MLMF initially processes multi-omics feature matrices by performing multi-layer linear or nonlinear factorization, decomposing the original data into latent feature representations unique to each omics type. These latent representations are subsequently fused into a consensus form, on which spectral clustering is performed to determine subtypes. Additionally, MLMF incorporates a class indicator matrix to handle missing omics data, creating a unified framework that can manage both complete and incomplete multi-omics data. Extensive experiments conducted on 10 multi-omics cancer datasets, both complete and with missing values, demonstrate that MLMF achieves results that are comparable to or surpass the performance of several state-of-the-art approaches.

Multi-layer matrix factorization for cancer subtyping using full and partial multi-omics dataset

TL;DR

Multi-Layer Matrix Factorization (MLMF), a novel approach for cancer subtyping that employs multi-omics data clustering that incorporates a class indicator matrix to handle missing omics data, creating a unified framework that can manage both complete and incomplete multi-omics data.

Abstract

Cancer, with its inherent heterogeneity, is commonly categorized into distinct subtypes based on unique traits, cellular origins, and molecular markers specific to each type. However, current studies primarily rely on complete multi-omics datasets for predicting cancer subtypes, often overlooking predictive performance in cases where some omics data may be missing and neglecting implicit relationships across multiple layers of omics data integration. This paper introduces Multi-Layer Matrix Factorization (MLMF), a novel approach for cancer subtyping that employs multi-omics data clustering. MLMF initially processes multi-omics feature matrices by performing multi-layer linear or nonlinear factorization, decomposing the original data into latent feature representations unique to each omics type. These latent representations are subsequently fused into a consensus form, on which spectral clustering is performed to determine subtypes. Additionally, MLMF incorporates a class indicator matrix to handle missing omics data, creating a unified framework that can manage both complete and incomplete multi-omics data. Extensive experiments conducted on 10 multi-omics cancer datasets, both complete and with missing values, demonstrate that MLMF achieves results that are comparable to or surpass the performance of several state-of-the-art approaches.

Paper Structure

This paper contains 12 sections, 21 equations, 2 figures, 1 table, 2 algorithms.

Figures (2)

  • Figure 1: The framework of MLMF. MLMF is an m-layer matrix decomposition structure based on multi-omics data. It is an iterative process that can decompose each omics data matrix $\boldsymbol{X}^{(v)}$ into two factor matrices ($\boldsymbol{Z}_i^{(v)}$,$\boldsymbol{H}_i^{(v)}$), and then fuses these factor matrices into a consensus representation $\boldsymbol{H}$, and optimized use two different cost functions, linear decomposition and nonlinear decomposition. Finally, cancer subtype is identified on consensus representation $\boldsymbol{H}$ via spectral clustering.
  • Figure 2: Mean performance of the different algorithms on 10 cancer datasets. Y-axis represents average -log10 logrank test’s P-values and X-axis represents average number of enriched clinical parameters in the clusters. The red dotted lines highlight the results of MLMF_Nonlinear and the brown dotted lines highlight the results of MLMF_Linear.