Table of Contents
Fetching ...

GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model

Haotong Yang, Xiyuan Wang, Qian Tao, Shuxian Hu, Zhouchen Lin, Muhan Zhang

TL;DR

GL-Fusion introduces a deeply integrated GNN-LLM architecture that unifies structure-aware transformers, graph-text cross-attention, and a GNN-LLM twin predictor to process text-attributed graphs end-to-end. It preserves full textual content throughout layers, avoids compressing node/edge text, and supports both language generation and graph-based predictions. The model demonstrates strong, state-of-the-art performance on diverse tasks, including ogbn-arxiv and ogbg-code2, across property prediction, node classification, knowledge graph completion, commonsense QA, and graph-to-language generation. This framework advances graph-text reasoning by jointly leveraging linguistic and structural signals with scalable, flexible predictions and prompts. The work also discusses limitations and broader impacts, outlining future directions for broader task coverage and safer deployment of LLM-powered graph reasoning.

Abstract

Recent research on integrating Large Language Models (LLMs) with Graph Neural Networks (GNNs) typically follows two approaches: LLM-centered models, which convert graph data into tokens for LLM processing, and GNN-centered models, which use LLMs to encode text features into node and edge representations for GNN input. LLM-centered models often struggle to capture graph structures effectively, while GNN-centered models compress variable-length textual data into fixed-size vectors, limiting their ability to understand complex semantics. Additionally, GNN-centered approaches require converting tasks into a uniform, manually-designed format, restricting them to classification tasks and preventing language output. To address these limitations, we introduce a new architecture that deeply integrates GNN with LLM, featuring three key innovations: (1) Structure-Aware Transformers, which incorporate GNN's message-passing capabilities directly into LLM's transformer layers, allowing simultaneous processing of textual and structural information and generating outputs from both GNN and LLM; (2) Graph-Text Cross-Attention, which processes full, uncompressed text from graph nodes and edges, ensuring complete semantic integration; and (3) GNN-LLM Twin Predictor, enabling LLM's flexible autoregressive generation alongside GNN's scalable one-pass prediction. GL-Fusion achieves outstand performance on various tasks. Notably, it achieves state-of-the-art performance on OGBN-Arxiv and OGBG-Code2.

GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model

TL;DR

GL-Fusion introduces a deeply integrated GNN-LLM architecture that unifies structure-aware transformers, graph-text cross-attention, and a GNN-LLM twin predictor to process text-attributed graphs end-to-end. It preserves full textual content throughout layers, avoids compressing node/edge text, and supports both language generation and graph-based predictions. The model demonstrates strong, state-of-the-art performance on diverse tasks, including ogbn-arxiv and ogbg-code2, across property prediction, node classification, knowledge graph completion, commonsense QA, and graph-to-language generation. This framework advances graph-text reasoning by jointly leveraging linguistic and structural signals with scalable, flexible predictions and prompts. The work also discusses limitations and broader impacts, outlining future directions for broader task coverage and safer deployment of LLM-powered graph reasoning.

Abstract

Recent research on integrating Large Language Models (LLMs) with Graph Neural Networks (GNNs) typically follows two approaches: LLM-centered models, which convert graph data into tokens for LLM processing, and GNN-centered models, which use LLMs to encode text features into node and edge representations for GNN input. LLM-centered models often struggle to capture graph structures effectively, while GNN-centered models compress variable-length textual data into fixed-size vectors, limiting their ability to understand complex semantics. Additionally, GNN-centered approaches require converting tasks into a uniform, manually-designed format, restricting them to classification tasks and preventing language output. To address these limitations, we introduce a new architecture that deeply integrates GNN with LLM, featuring three key innovations: (1) Structure-Aware Transformers, which incorporate GNN's message-passing capabilities directly into LLM's transformer layers, allowing simultaneous processing of textual and structural information and generating outputs from both GNN and LLM; (2) Graph-Text Cross-Attention, which processes full, uncompressed text from graph nodes and edges, ensuring complete semantic integration; and (3) GNN-LLM Twin Predictor, enabling LLM's flexible autoregressive generation alongside GNN's scalable one-pass prediction. GL-Fusion achieves outstand performance on various tasks. Notably, it achieves state-of-the-art performance on OGBN-Arxiv and OGBG-Code2.

Paper Structure

This paper contains 34 sections, 4 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Workflow of the GL-Fusion model. (a) An example of a text-attributed graph, where each node and edge has a text attribute that will be encoded by the LLM. (b) The encoded text-attributed graph nodes (<node> in the figure), along with a prompt or question in natural language, are merged into an input sequence for our GL-Fusion model. GL-Fusion consists of several structure-aware transformer blocks and standard transformer blocks. Cross-attention layers are inserted to retrieve the complete, uncompressed node and edge text. The output on graph nodes is further processed by graph readout components, while the output on standard text tokens predicts the next token, similar to an original autoregressive LLM.
  • Figure 2: Design of Structure-Aware Transformer layer. (a) Structure-aware Transformer layer. The brown curve indicates the message-passing process along graph edges. The dashed brown and blue lines represent causal self-attention and full attention, respectively. The boxed numbers indicate shared positional encodings. (b) The attention mask in structure-aware transformer layers. The blue part represents the ordinary causal attention mask, and the yellow part allows attention between nodes in the same graph.
  • Figure 3: The attention mask in cross-attention layers. For text tokens before the graph, they do not involve cross-attention to maintain causality (red dashed line with the ×). For node tokens <node>, each token only has access to its own text (orange lines). For text tokens after the graph, they have access to all node text (black lines).