A Novel Graph Transformer Framework for Gene Regulatory Network Inference
Binon Teji, Swarup Roy
TL;DR
This work tackles gene regulatory network (GRN) inference from noisy gene-expression data by formulating it as a link-prediction problem. It introduces GT-GRN, a Graph Transformer framework that fuses three information streams: gene-expression embeddings learned via a Variational Autoencoder, global gene embeddings derived from multi-network prior knowledge encoded as text-like sequences processed by a BERT-based model, and graph positional encodings from the input network. The approach demonstrates superior performance on both full network reconstruction and link-prediction tasks across multiple datasets, and shows utility for cell-type annotation through learned gene embeddings. The results underscore the value of multi-modal integration and global context in GRN inference, with potential extensions toward prioritizing disease-relevant genes and regulatory hubs.
Abstract
The inference of gene regulatory networks (GRNs) is a foundational stride towards deciphering the fundamentals of complex biological systems. Inferring a possible regulatory link between two genes can be formulated as a link prediction problem. Inference of GRNs via gene coexpression profiling data may not always reflect true biological interactions, as its susceptibility to noise and misrepresenting true biological regulatory relationships. Most GRN inference methods face several challenges in the network reconstruction phase. Therefore, it is important to encode gene expression values, leverege the prior knowledge gained from the available inferred network structures and positional informations of the input network nodes towards inferring a better and more confident GRN network reconstruction. In this paper, we explore the integration of multiple inferred networks to enhance the inference of Gene Regulatory Networks (GRNs). Primarily, we employ autoencoder embeddings to capture gene expression patterns directly from raw data, preserving intricate biological signals. Then, we embed the prior knowledge from GRN structures transforming them into a text-like representation using random walks, which are then encoded with a masked language model, BERT, to generate global embeddings for each gene across all networks. Additionally, we embed the positional encodings of the input gene networks to better identify the position of each unique gene within the graph. These embeddings are integrated into graph transformer-based model, termed GT-GRN, for GRN inference. The GT-GRN model effectively utilizes the topological structure of the ground truth network while incorporating the enriched encoded information. Experimental results demonstrate that GT-GRN significantly outperforms existing GRN inference methods, achieving superior accuracy and highlighting the robustness of our approach.
