Table of Contents
Fetching ...

Multi-View Empowered Structural Graph Wordification for Language Models

Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

TL;DR

This work addresses the challenge of integrating graph-structured data with large language models by proposing Dr.E, a Dual-Residual Vector Quantized-Variational AutoEncoder that translates graph structure into LLM-friendly tokens. It introduces Multi-View Structural Enhancement to capture central-node context across multiple hop views and dual-residue mechanisms (Intra-Layer and Inter-Layer) to preserve layer-specific information and mitigate oversmoothing during GNN aggregation. The framework aligns discrete graph tokens with the LLM vocabulary, enabling the LLM to predict downstream labels directly in natural language while preserving linguistic interpretability. Empirical results on Cora, PubMed, and OGBN-Arxiv show competitive or state-of-the-art performance without leveraging textual node attributes, and analyses demonstrate the benefits of multi-view views, codebook usage, and residual mechanisms. The approach offers a scalable path toward robust, token-level graph reasoning with LLMs and has practical implications for graph-centric reasoning in diverse domains; code is available to facilitate adoption.

Abstract

Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings into LLMs at the cost of losing explainable prompt semantics. To bridge this gap, we introduce an end-to-end modality-aligning framework for LLM-graph alignment: Dual-Residual Vector Quantized-Variational AutoEncoder, namely Dr.E. Our approach is purposefully designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. We also manage to enhance LLMs' more robust structural understanding of graphs by incorporating multiple views of the central nodes based on their surrounding nodes at various distances. Our experimental evaluations on standard graph tasks demonstrate competitive performance against other state-of-the-art (SOTA) approaches. Additionally, our framework ensures certain visual interpretability, efficiency, and robustness, marking the promising successful endeavor to achieve token-level alignment between LLMs and GNNs. Our code is available at: https://github.com/Timothy914/Dr.E.

Multi-View Empowered Structural Graph Wordification for Language Models

TL;DR

This work addresses the challenge of integrating graph-structured data with large language models by proposing Dr.E, a Dual-Residual Vector Quantized-Variational AutoEncoder that translates graph structure into LLM-friendly tokens. It introduces Multi-View Structural Enhancement to capture central-node context across multiple hop views and dual-residue mechanisms (Intra-Layer and Inter-Layer) to preserve layer-specific information and mitigate oversmoothing during GNN aggregation. The framework aligns discrete graph tokens with the LLM vocabulary, enabling the LLM to predict downstream labels directly in natural language while preserving linguistic interpretability. Empirical results on Cora, PubMed, and OGBN-Arxiv show competitive or state-of-the-art performance without leveraging textual node attributes, and analyses demonstrate the benefits of multi-view views, codebook usage, and residual mechanisms. The approach offers a scalable path toward robust, token-level graph reasoning with LLMs and has practical implications for graph-centric reasoning in diverse domains; code is available to facilitate adoption.

Abstract

Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings into LLMs at the cost of losing explainable prompt semantics. To bridge this gap, we introduce an end-to-end modality-aligning framework for LLM-graph alignment: Dual-Residual Vector Quantized-Variational AutoEncoder, namely Dr.E. Our approach is purposefully designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. We also manage to enhance LLMs' more robust structural understanding of graphs by incorporating multiple views of the central nodes based on their surrounding nodes at various distances. Our experimental evaluations on standard graph tasks demonstrate competitive performance against other state-of-the-art (SOTA) approaches. Additionally, our framework ensures certain visual interpretability, efficiency, and robustness, marking the promising successful endeavor to achieve token-level alignment between LLMs and GNNs. Our code is available at: https://github.com/Timothy914/Dr.E.
Paper Structure (24 sections, 11 equations, 4 figures, 4 tables)

This paper contains 24 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: A demonstration of our proposed framework, Dr.E, which serves effectively as a seamless interpreter, translating graphs (depicted on the left) into comprehensible natural language (as displayed on the right).
  • Figure 2: The overall framework of Dr.E encompasses a modified RQ-VAE architecture, where the encoder is a GNN module that directly processes the raw features of nodes in the graph, and the decoder is an LLM decoding codes' embeddings back to labels. We also incorporate additional features, labels, and adjacency matrix reconstruction to facilitate the training process. The token embeddings of the LLM serve as a critical codebook, bridging the encoder and the decoder seamlessly.
  • Figure 3: We investigate the effect of the number of views on the model's performance. The x-axis represents the number of views used, while the y-axis shows the relative performance of the model under different numbers of views, normalized against the performance achieved with 3 views.
  • Figure 4: The figure above shows the perplexity for selection of tokens on each dataset. The blue line represents the perplexity of the 1-hop view of the codes, while the green and yellow lines represent the 2-hop and 3-hop views, respectively. Note that we apply a Savitzky-Golay filter when plotting the line graph to improve readability.