Table of Contents
Fetching ...

Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network

Binon Teji, Subhajit Bandyopadhyay, Swarup Roy

TL;DR

NETRA (Node Evaluation through Transformer-based Representation and Attention), a multimodal graph transformer framework that replaces heuristic centrality metrics with attention-driven relevance scoring, is proposed and substantially outperforms classical centrality measures and diffusion models.

Abstract

Prioritizing disease-associated genes is central to understanding the molecular mechanisms of complex disorders such as Alzheimer's disease (AD). Traditional network-based approaches rely on static centrality measures and often fail to capture cross-modal biological heterogeneity. We propose NETRA (Node Evaluation through Transformer-based Representation and Attention), a multimodal graph transformer framework that replaces heuristic centrality metrics with attention-driven relevance scoring. Using AD as a case study, gene regulatory networks are independently constructed from microarray, single-cell RNA-seq, and single-nucleus RNA-seq data. Random-walk sequences derived from these networks are used to train a BERT-based model for learning global gene embeddings, while modality-specific gene expression profiles are compressed using variational autoencoders. These representations are integrated with auxiliary biological networks, including protein-protein interactions, Gene Ontology semantic similarity, and diffusion-based gene similarity, into a unified multimodal graph. A graph transformer assigns NETRA scores that quantify gene relevance in a disease-specific and context-aware manner. Gene set enrichment analysis shows that NETRA achieves a normalized enrichment score of about 3.9 for the Alzheimer's disease pathway, substantially outperforming classical centrality measures and diffusion models. Top-ranked genes enrich multiple neurodegenerative pathways, recover a known late-onset AD susceptibility locus at chr12q13, and reveal conserved cross-disease gene modules. The framework preserves biologically realistic heavy-tailed network topology and is readily extensible to other complex disorders.

Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network

TL;DR

NETRA (Node Evaluation through Transformer-based Representation and Attention), a multimodal graph transformer framework that replaces heuristic centrality metrics with attention-driven relevance scoring, is proposed and substantially outperforms classical centrality measures and diffusion models.

Abstract

Prioritizing disease-associated genes is central to understanding the molecular mechanisms of complex disorders such as Alzheimer's disease (AD). Traditional network-based approaches rely on static centrality measures and often fail to capture cross-modal biological heterogeneity. We propose NETRA (Node Evaluation through Transformer-based Representation and Attention), a multimodal graph transformer framework that replaces heuristic centrality metrics with attention-driven relevance scoring. Using AD as a case study, gene regulatory networks are independently constructed from microarray, single-cell RNA-seq, and single-nucleus RNA-seq data. Random-walk sequences derived from these networks are used to train a BERT-based model for learning global gene embeddings, while modality-specific gene expression profiles are compressed using variational autoencoders. These representations are integrated with auxiliary biological networks, including protein-protein interactions, Gene Ontology semantic similarity, and diffusion-based gene similarity, into a unified multimodal graph. A graph transformer assigns NETRA scores that quantify gene relevance in a disease-specific and context-aware manner. Gene set enrichment analysis shows that NETRA achieves a normalized enrichment score of about 3.9 for the Alzheimer's disease pathway, substantially outperforming classical centrality measures and diffusion models. Top-ranked genes enrich multiple neurodegenerative pathways, recover a known late-onset AD susceptibility locus at chr12q13, and reveal conserved cross-disease gene modules. The framework preserves biologically realistic heavy-tailed network topology and is readily extensible to other complex disorders.
Paper Structure (22 sections, 20 equations, 10 figures, 1 table)

This paper contains 22 sections, 20 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of the proposed multimodal graph--transformer framework for gene prioritization.(A) Multi-omics gene expression data from microarray, single-cell RNA-seq (scRNA-seq), and single-nucleus RNA-seq (snRNA-seq). (B) Each modality is encoded using a variational autoencoder (VAE) to obtain latent representations $\mathcal{Z}_{ma}$, $\mathcal{Z}_{sc}$, and $\mathcal{Z}_{sn}$, which are fused into a unified gene embedding $\mathcal{Z}_F$. (C) Multiple inferred networks are pooled and transformed into node sequences using random-walk sampling, which are then tokenized, positionally encoded, and encoded using a BERT-based Transformer to learn rich contextual representations. (D) The fused embeddings are integrated with the ensemble graph $\mathcal{A}$ and graph positional encodings to produce node features $\mathbf{h}$. (E) A multi-layer graph transformer applies neighborhood attention using graph-specific query, key, and value projections for link prediction and attention aggregation. The aggregated gene-attention scores are used for gene ranking, enrichment analysis (e.g., GSEA, pathway and disease databases), and final reporting and visualization.
  • Figure 2: Training dynamics of our learning model.(left) training loss convergence and (right) AUROC progression across epochs, demonstrating stable optimization and robust generalization.
  • Figure 3: UMAP visualization of gene embeddings with Leiden clustering and attention-prioritized (top-15) genes. Each point represents a gene embedding projected into two dimensions. Colors indicate Leiden clusters (labels 0–13). Attention-prioritized genes are highlighted in red and annotated with gene symbols using arrowed labels. The spatial distribution demonstrates clear cluster separation while showing the dispersion of prioritized genes across multiple functional communities.
  • Figure 4: Visualization of an attention-weighted subgraph induced by the top-10 prioritized genes.Prioritized genes are shown in gold, neighboring context genes in gray, with node sizes proportional to NETRA attention scores. Blue edges represent gene-gene interactions.
  • Figure 5: Structural comparison between the input ensemble network and the generated network.(a) Comparison of global graph characteristics between the input ensemble network and the generated network on a logarithmic scale, including maximum degree, triangle count, global clustering coefficient, and global efficiency. (b) Log–log degree distribution of the input ensemble network and the generated network, demonstrating the preservation of heavy-tailed degree behavior in the generated network.
  • ...and 5 more figures