Table of Contents
Fetching ...

Personalized graph feature-based multi-omics data integration for cancer subtype identification

Saiful Islam, Md. Nahid Hasan

TL;DR

Cancer subtype identification is hindered by heterogeneous multi-omics data. The authors propose a cosine-based patient similarity network framework that builds patient-specific subnetworks from each omic type, extracts nine topology features per subnetwork, and fuses them via averaging before clustering with K-means. Evaluated on five TCGA datasets (BIC, COAD, GBM, KRCCC, LSCC), the method yields distinct subtypes with significant survival differences and often outperforms existing multi-omics integration approaches. This approach offers a simple, scalable path toward personalized cancer diagnosis and treatment by leveraging network topology in a multi-omics setting.

Abstract

Cancer is a highly heterogeneous disease with significant variability in molecular features and clinical outcomes, making diagnosis and treatment challenging. In recent years, high-throughput omic technologies have facilitated the discovery of mechanisms underlying various cancer subtypes by providing diverse omics data, such as gene expression, DNA methylation, and miRNA expression. However, the complexity and heterogeneity of multi-omics data present significant challenges for their integration in exploring cancer subtypes. Various methods have been proposed to address these challenges. In this paper, we propose a novel and straightforward approach for identifying cancer subtypes by integrating patient-specific subnetworks features from different omics data. We construct patient-specific induced subnetwork using a random walk with restart algorithm from patient similarity networks (PSNs) and compute nine structural properties that capture essential network topology. These features are integrated across the three omic datasets to form comprehensive patient profiles. K-means clustering is then applied for cancer subtype identification. We evaluate our approach on five cancer datasets, including breast invasive carcinoma, colon adenocarcinoma, glioblastoma multiforme, kidney renal clear cell carcinoma, and lung squamous cell carcinoma, for three different omic data types. The evaluation shows that our method produces promising and effective results, demonstrating competitive or superior performance compared to existing methods and underscoring its potential for advancing personalized cancer diagnosis and treatment.

Personalized graph feature-based multi-omics data integration for cancer subtype identification

TL;DR

Cancer subtype identification is hindered by heterogeneous multi-omics data. The authors propose a cosine-based patient similarity network framework that builds patient-specific subnetworks from each omic type, extracts nine topology features per subnetwork, and fuses them via averaging before clustering with K-means. Evaluated on five TCGA datasets (BIC, COAD, GBM, KRCCC, LSCC), the method yields distinct subtypes with significant survival differences and often outperforms existing multi-omics integration approaches. This approach offers a simple, scalable path toward personalized cancer diagnosis and treatment by leveraging network topology in a multi-omics setting.

Abstract

Cancer is a highly heterogeneous disease with significant variability in molecular features and clinical outcomes, making diagnosis and treatment challenging. In recent years, high-throughput omic technologies have facilitated the discovery of mechanisms underlying various cancer subtypes by providing diverse omics data, such as gene expression, DNA methylation, and miRNA expression. However, the complexity and heterogeneity of multi-omics data present significant challenges for their integration in exploring cancer subtypes. Various methods have been proposed to address these challenges. In this paper, we propose a novel and straightforward approach for identifying cancer subtypes by integrating patient-specific subnetworks features from different omics data. We construct patient-specific induced subnetwork using a random walk with restart algorithm from patient similarity networks (PSNs) and compute nine structural properties that capture essential network topology. These features are integrated across the three omic datasets to form comprehensive patient profiles. K-means clustering is then applied for cancer subtype identification. We evaluate our approach on five cancer datasets, including breast invasive carcinoma, colon adenocarcinoma, glioblastoma multiforme, kidney renal clear cell carcinoma, and lung squamous cell carcinoma, for three different omic data types. The evaluation shows that our method produces promising and effective results, demonstrating competitive or superior performance compared to existing methods and underscoring its potential for advancing personalized cancer diagnosis and treatment.
Paper Structure (10 sections, 4 equations, 3 figures, 3 tables)

This paper contains 10 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed approach for identifying cancer subtypes from multi-omics data. First, a Patient Similarity Network (PSN) is constructed from each given omic data. Next, random walk with restart is applied to construct sub-network for each node. Network features are then extracted from these sub-networks. Subsequently, these network features are aggregated. Finally, K-means algorithm is employed to identify cancer subtypes.
  • Figure 2: Average silhouette scores for the number of clusters, $K\in \{2,3,\cdots, 10\}$ across five datasets.
  • Figure 3: Survival analysis curves of the patient subtypes and two-dimensional representations of aggregated features for five datasets. Panels (a)-(e) display the two-dimensional representations of the aggregated features for individual patients in the datasets: (a) BIC, (b) COAD, (c) GBM, (d) KRCCC, and (e) LSCC. Each circle represents a patient. Panels (f)-(j) demonstrate the survival curves corresponding to the datasets BIC, COAD, GBM, KRCCC, and LSCC, respectively. The patient subtypes are denoted as S1, S2, S3, S4, S5, and S6, with the colors in the survival curves indicating the respective subtypes. The $-\log10$ of the $p-$values in the Cox log-rank test are 0.0101, 0.0083, 0.0039, 0.0016, and 0.0039 for BIC, COAD, GBM, KRCCC, and LSCC, respectively.