GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model
Haotong Yang, Xiyuan Wang, Qian Tao, Shuxian Hu, Zhouchen Lin, Muhan Zhang
TL;DR
GL-Fusion introduces a deeply integrated GNN-LLM architecture that unifies structure-aware transformers, graph-text cross-attention, and a GNN-LLM twin predictor to process text-attributed graphs end-to-end. It preserves full textual content throughout layers, avoids compressing node/edge text, and supports both language generation and graph-based predictions. The model demonstrates strong, state-of-the-art performance on diverse tasks, including ogbn-arxiv and ogbg-code2, across property prediction, node classification, knowledge graph completion, commonsense QA, and graph-to-language generation. This framework advances graph-text reasoning by jointly leveraging linguistic and structural signals with scalable, flexible predictions and prompts. The work also discusses limitations and broader impacts, outlining future directions for broader task coverage and safer deployment of LLM-powered graph reasoning.
Abstract
Recent research on integrating Large Language Models (LLMs) with Graph Neural Networks (GNNs) typically follows two approaches: LLM-centered models, which convert graph data into tokens for LLM processing, and GNN-centered models, which use LLMs to encode text features into node and edge representations for GNN input. LLM-centered models often struggle to capture graph structures effectively, while GNN-centered models compress variable-length textual data into fixed-size vectors, limiting their ability to understand complex semantics. Additionally, GNN-centered approaches require converting tasks into a uniform, manually-designed format, restricting them to classification tasks and preventing language output. To address these limitations, we introduce a new architecture that deeply integrates GNN with LLM, featuring three key innovations: (1) Structure-Aware Transformers, which incorporate GNN's message-passing capabilities directly into LLM's transformer layers, allowing simultaneous processing of textual and structural information and generating outputs from both GNN and LLM; (2) Graph-Text Cross-Attention, which processes full, uncompressed text from graph nodes and edges, ensuring complete semantic integration; and (3) GNN-LLM Twin Predictor, enabling LLM's flexible autoregressive generation alongside GNN's scalable one-pass prediction. GL-Fusion achieves outstand performance on various tasks. Notably, it achieves state-of-the-art performance on OGBN-Arxiv and OGBG-Code2.
