Graph Representation Learning Strategies for Omics Data: A Case Study on Parkinson's Disease
Elisa Gómez de Lope, Saurabh Deshpande, Ramón Viñas Torné, Pietro Liò, Enrico Glaab, Stéphane P. A. Bordas
TL;DR
This study tackles the challenge of analyzing high-dimensional, noisy, and heterogeneous omics data in Parkinson's disease by evaluating graph representation learning across two cohorts: transcriptomics from PPMI and metabolomics from LUXPARK. It contrasts sample-similarity networks (SSN) and molecular-interaction networks (MINs) built from PPI (STRING) and MMI (STITCH), using a suite of GNNs, graph transformers, and a baseline MLP, with a two-layer default configuration. The SSN adjacency is defined by pairwise cosine similarity with a threshold, A_{ij} = { cs(x_i, x_j) if i ≠ j and cs(x_i, x_j) ≥ s, 0 otherwise }, enabling effective information propagation and interpretability via GNN-Explainer. Key findings show SSNs generally outperform MINs, graph transformers outperform traditional GNNs, and LASSO feature selection enhances performance; biologically, signals converge on the mitochondrial fatty acid oxidation pathway involving SLC25A20, CPT1A, and glutarylcarnitine, offering PD-relevant insights with potential for biomarker discovery.
Abstract
Omics data analysis is crucial for studying complex diseases, but its high dimensionality and heterogeneity challenge classical statistical and machine learning methods. Graph neural networks have emerged as promising alternatives, yet the optimal strategies for their design and optimization in real-world biomedical challenges remain unclear. This study evaluates various graph representation learning models for case-control classification using high-throughput biological data from Parkinson's disease and control samples. We compare topologies derived from sample similarity networks and molecular interaction networks, including protein-protein and metabolite-metabolite interactions (PPI, MMI). Graph Convolutional Network (GCNs), Chebyshev spectral graph convolution (ChebyNet), and Graph Attention Network (GAT), are evaluated alongside advanced architectures like graph transformers, the graph U-net, and simpler models like multilayer perceptron (MLP). These models are systematically applied to transcriptomics and metabolomics data independently. Our comparative analysis highlights the benefits and limitations of various architectures in extracting patterns from omics data, paving the way for more accurate and interpretable models in biomedical research.
