Table of Contents
Fetching ...

Multi-Omic and Quantum Machine Learning Integration for Lung Subtypes Classification

Mandeep Kaur Saggi, Amandeep Singh Bhatia, Mensah Isaiah, Humaira Gowher, Sabre Kais

TL;DR

A method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery and the fusion of quantum computing and machine learning.

Abstract

Quantum Machine Learning (QML) is a red-hot field that brings novel discoveries and exciting opportunities to resolve, speed up, or refine the analysis of a wide range of computational problems. In the realm of biomedical research and personalized medicine, the significance of multi-omics integration lies in its ability to provide a thorough and holistic comprehension of complex biological systems. This technology links fundamental research to clinical practice. The insights gained from integrated omics data can be translated into clinical tools for diagnosis, prognosis, and treatment planning. The fusion of quantum computing and machine learning holds promise for unraveling complex patterns within multi-omics datasets, providing unprecedented insights into the molecular landscape of lung cancer. Due to the heterogeneity, complexity, and high dimensionality of multi-omic cancer data, characterized by the vast number of features (such as gene expression, micro-RNA, and DNA methylation) relative to the limited number of lung cancer patient samples, our prime motivation for this paper is the integration of multi-omic data, unique feature selection, and diagnostic classification of lung subtypes: lung squamous cell carcinoma (LUSC-I) and lung adenocarcinoma (LUAD-II) using quantum machine learning. We developed a method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery.

Multi-Omic and Quantum Machine Learning Integration for Lung Subtypes Classification

TL;DR

A method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery and the fusion of quantum computing and machine learning.

Abstract

Quantum Machine Learning (QML) is a red-hot field that brings novel discoveries and exciting opportunities to resolve, speed up, or refine the analysis of a wide range of computational problems. In the realm of biomedical research and personalized medicine, the significance of multi-omics integration lies in its ability to provide a thorough and holistic comprehension of complex biological systems. This technology links fundamental research to clinical practice. The insights gained from integrated omics data can be translated into clinical tools for diagnosis, prognosis, and treatment planning. The fusion of quantum computing and machine learning holds promise for unraveling complex patterns within multi-omics datasets, providing unprecedented insights into the molecular landscape of lung cancer. Due to the heterogeneity, complexity, and high dimensionality of multi-omic cancer data, characterized by the vast number of features (such as gene expression, micro-RNA, and DNA methylation) relative to the limited number of lung cancer patient samples, our prime motivation for this paper is the integration of multi-omic data, unique feature selection, and diagnostic classification of lung subtypes: lung squamous cell carcinoma (LUSC-I) and lung adenocarcinoma (LUAD-II) using quantum machine learning. We developed a method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery.
Paper Structure (26 sections, 33 equations, 17 figures, 11 tables, 2 algorithms)

This paper contains 26 sections, 33 equations, 17 figures, 11 tables, 2 algorithms.

Figures (17)

  • Figure 1: Summary of Multi-Omic Modalities and Clinical information (a) Development of Non-small cell lung cancer (NSCLC) lung cancer. Overview of Non-Small Cell Lung Cancer (NSCLC) subtypes, highlighting the three main subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell carcinoma. This study focuses on two NSCLC subtypes, LUAD and LUSC. (b) Summary of Clinical information of subtypes diagnosis class with gender, sample type. Also combined subtypes patient’s age, Pathological stages, ethnicity, and race (c) information from GDC-TCGA with each entry indicating lung dataset subtype-I (LUSC) and subtype II (LUAD) 915 patient samples and high dimensional features of each omic
  • Figure 2: Flowchart of Data Engineering process. This diagram illustrates the steps involved in analyzing and integrating multiple omic datasets, including mean value calculation, statistical t-tests, integration of datasets, and combination with clinical and survival attributes.
  • Figure 3: Workflow of multi-omics integration and classification using quantum neural networks. (a) Schematic representation of the overall pipeline of MQML-QNN framework. (b) Data acquisition from GDC-TCGA, including (i) DNAme, (ii) RNA-seq, (iii) miRNA-seq, and (iv) clinical and survival attributes of patients. (c) Data preprocessing and feature engineering using t-test p-values for each omic data type to differentiate between subtypes I and II. (d)-(e) Feature selection process involving: (i) Four feature selection models: Random Forest (RF), Mutual Information (MI), Principal Component Analysis (PCA), and Chi-Squared (Chi). (ii) AUC-ROC analysis with thresholding for feature selection. (iii) Hierarchical clustering based on sample vs. feature and pairwise feature similarity using distance metrics. (f) Integration of multi-omic data resulting in 256 features from 915 common patients. (g) Quantum amplitude encoding with three feature dimensions: 32, 64, and 256, and the quantum circuit with a dense layer for diagnostic classification of LUAD versus LUSC lung datasets. (h) Visualization of features through violin plots, confusion matrices, and heatmaps.
  • Figure 4: Visualization of Venn Diagram and Represents the Feature Importance Plot with score Represent the visualization through venn diagram for the features selection process using four machine learning methods on DNA methylation (DNAme), RNA transcript level (RNA-seq), and miRNA-seq levels (miRNA) modalities. (a) DNA Sample$_{1}$ to Sample$_{3}$, (b) RNA Sample$_{1}$ to Sample$_{3}$ and (c) miRNA Sample$_{1}$ to Sample$_{2}$, Visualization of feature selection using four models i.e. i) Mututal information, (ii) Chi-square, (iii) Principal component (iv) Random forest of three modalities (d) DNA S$_{1}$ to S$_{3}$, (e) RNA S$_{1}$ to S$_{3}$, and (f) miRNA S$_{1}$ to S$_{2}$
  • Figure 5: Represents the Hierarchical Clustering Dendogram based pairwise feature selection and distance-based selection of features of each omic subsets based on (a)-(c) DNA S$_{1}$ to S$_{3}$, (d)-(f) RNA S$_{1}$ to S$_{3}$, and (g)-(h) miRNA S$_{1}$ to S$_{3}$,
  • ...and 12 more figures