Table of Contents
Fetching ...

Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis

Sakib Mostafa, Lei Xing, Md. Tauhidul Islam

TL;DR

Graph2Image transforms large biological graphs into image representations via community-aware OT mapping, enabling scalable CNN-based analysis and direct interpretability. It overcomes GNN limitations and demonstrates state-of-the-art performance across tissue-of-expression tasks, whole-organism cell atlas, pan-cancer multi-omics, and prostate cancer progression. SHAP-based attribution reveals biologically meaningful gene programs and tissue axes, while the method scales to networks with billions of edges on commodity hardware. This approach unlocks scalable, multimodal, and interpretable network analysis with potential in disease diagnosis and systems biology.

Abstract

Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding health and disease, yet their scale and complexity represent a daunting challenge for current computational methods. Traditional biological network analysis methods, including deep learning approaches, while powerful, face inherent challenges such as limited scalability, oversmoothing long-range dependencies, difficulty in multimodal integration, expressivity bounds, and poor interpretability. We present Graph2Image, a framework that transforms large biological networks into sets of two-dimensional images by spatially arranging representative network nodes on a 2D grid. This transformation decouples the nodes as images, enabling the use of convolutional neural networks (CNNs) with global receptive fields and multi-scale pyramids, thus overcoming limitations of existing biological network analysis methods in scalability, memory efficiency, and long-range context capture. Graph2Image also facilitates seamless integration with other imaging and omics modalities and enhances interpretability through direct visualization of node-associated images. When applied to several large-scale biological network datasets, Graph2Image improved classification accuracy by up to 67.2% over existing methods and provided interpretable visualizations that revealed biologically coherent patterns. It also allows analysis of very large biological networks (nodes > 1 billion) on a personal computer. Graph2Image thus provides a scalable, interpretable, and multimodal-ready approach for biological network analysis, offering new opportunities for disease diagnosis and the study of complex biological systems.

Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis

TL;DR

Graph2Image transforms large biological graphs into image representations via community-aware OT mapping, enabling scalable CNN-based analysis and direct interpretability. It overcomes GNN limitations and demonstrates state-of-the-art performance across tissue-of-expression tasks, whole-organism cell atlas, pan-cancer multi-omics, and prostate cancer progression. SHAP-based attribution reveals biologically meaningful gene programs and tissue axes, while the method scales to networks with billions of edges on commodity hardware. This approach unlocks scalable, multimodal, and interpretable network analysis with potential in disease diagnosis and systems biology.

Abstract

Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding health and disease, yet their scale and complexity represent a daunting challenge for current computational methods. Traditional biological network analysis methods, including deep learning approaches, while powerful, face inherent challenges such as limited scalability, oversmoothing long-range dependencies, difficulty in multimodal integration, expressivity bounds, and poor interpretability. We present Graph2Image, a framework that transforms large biological networks into sets of two-dimensional images by spatially arranging representative network nodes on a 2D grid. This transformation decouples the nodes as images, enabling the use of convolutional neural networks (CNNs) with global receptive fields and multi-scale pyramids, thus overcoming limitations of existing biological network analysis methods in scalability, memory efficiency, and long-range context capture. Graph2Image also facilitates seamless integration with other imaging and omics modalities and enhances interpretability through direct visualization of node-associated images. When applied to several large-scale biological network datasets, Graph2Image improved classification accuracy by up to 67.2% over existing methods and provided interpretable visualizations that revealed biologically coherent patterns. It also allows analysis of very large biological networks (nodes > 1 billion) on a personal computer. Graph2Image thus provides a scalable, interpretable, and multimodal-ready approach for biological network analysis, offering new opportunities for disease diagnosis and the study of complex biological systems.

Paper Structure

This paper contains 21 sections, 21 equations, 35 figures, 13 tables.

Figures (35)

  • Figure 1: The Graph2Image framework for transforming attributed graphs into multi-channel images. The pipeline learns in two parallel paths. a, The structural path learns the graph's topology. It takes the Adjacency Matrix, uses Clustering to find communities, and maps the centroid distance between clusters on a 2D grid using the Optimal Transport (OT) algorithm to create a structural image. b, The feature path learns node feature relationships. It calculates a feature interaction matrix based on the Correlation Among Features and maps the features on a 2D grid using a different Optimal Transport algorithm to generate feature images. These two images are then combined into a final, multi-channel image for each node. This set of images is used to train a Convolutional Neural Network for downstream tasks like Classification and Regression.
  • Figure 2: Performance and tissue-level interpretability on the PP-Pathways dataset. a, Class-averaged SHAP heatmap showing normalized contributions of selected GTEx tissues (features) to Graph2Image predictions for eight dominant-tissue classes (rows). b, Confusion matrix of Graph2Image predictions across the eight tissue classes (counts per cell). c, Macro-averaged ROC curves comparing Graph2Image with GNN baselines (GAT, GCN, GIN and GraphSAGE). d, Macro-averaged precision–recall curves for the same methods. e, Summary comparison of Accuracy, macro F1, Precision and Recall between Graph2Image and the GNN baselines. See Supplementary Figs. \ref{['supp_fig:ppathway_confusion']}, \ref{['supp_fig:ppathway_shap_heatmap']}, \ref{['supp_fig:ppathway_dendrogram']}, and \ref{['supp_fig:ppathway_class_dendogram']} for the full 54-tissue confusion matrix, SHAP clustermap, tissue-level SHAP dendrograms, and class dendrogram.
  • Figure 3: Performance and interpretability of Graph2Image on the HuRI + GTEx dataset. a, Radial SHAP map summarizing normalized SHAP values for each GTEx tissue feature across the nine primary tissue classes; concentric rings correspond to classes and angular positions to tissues. b, SHAP chord diagram highlighting the strongest positive links between tissue classes (bottom) and influential GTEx tissue features (top), illustrating shared and class-specific programs. c, Receiver operating characteristic (ROC) curves for Graph2Image and GNN baselines (GAT, GCN, GIN, GraphSAGE) in the nine-way tissue classification task. d, Confusion matrix of Graph2Image predictions across the nine primary tissues (counts in linear scale). e, Classification comparison (Accuracy, macro F1, macro Precision, macro Recall) between Graph2Image and GNN baselines; bars show mean scores with bootstrap confidence intervals. See Supplementary Figs. \ref{['supp_fig:huri_confusion']}, \ref{['supp_fig:huri_shap_heatmap']}, \ref{['supp_fig:huri_tissue_dendrogram']}, and \ref{['supp_fig:huri_class_dendogram']} for the full confusion matrix, SHAP clustermaps, and dendrograms.
  • Figure 4: Performance and interpretability on the Tabula Muris dataset. a, Class-averaged SHAP heatmap for top marker genes across representative cell types; values are normalized SHAP scores. b, Confusion matrix of Graph2Image predictions across 55 cell types (counts in log scale). c, Representative Graph2Image outputs from Tabula Muris with samples from Alveolar Macrophage and Epithelial Cell. d, Classification comparison (Accuracy, F1, Precision, Recall) between Graph2Image and GNN baselines (GAT, GCN, GIN, GraphSAGE). See Supplementary Figs. \ref{['supp_fig:tm_confusion']}, \ref{['supp_fig:tm_dendrogram']}, and \ref{['supp_fig:tm_shap_heatmap']} for full-resolution confusion matrix, dendrogram and comprehensive heatmap.
  • Figure 5: Performance and modality contributions on the pan-cancer cohort. a, Relative contribution of each omic layer (mRNA, CNV, DNA methylation) per cancer type, computed by aggregating SHAP importance. b, Class-averaged SHAP heatmap for top mRNA features across cancer types (normalized SHAP scores). c, Confusion matrix of Graph2Image predictions across 32 TCGA cancer types (log-scale counts). d, Representative Graph2Image outputs from the Pan Cancer dataset where images corresponds to different omics types of Adrenocortical carcinoma (ACC) and Kidney renal clear cell carcinoma (KIRC). e, Classification comparison (Accuracy, F1, Precision, Recall) between Graph2Image and GNN baselines. See Supplementary Figs. \ref{['supp_fig:pan_confusion']}, \ref{['supp_fig:pan_shap_heatmap']}, \ref{['supp_fig:pan_modal_contrib']}, and \ref{['supp_fig:pan_dendrogram']} for extended confusion matrices, heatmaps, modality breakdowns and clustering analyses.
  • ...and 30 more figures