Table of Contents
Fetching ...

Verbalized Graph Representation Learning: A Fully Interpretable Graph Model Based on Large Language Models Throughout the Entire Process

Xingyu Ji, Jiale Liu, Lu Li, Maojun Wang, Zeyu Zhang

TL;DR

This work tackles the lack of full interpretability in graph representation learning by introducing Verbalized Graph Representation Learning (VGRL), which constrains model parameters to textual descriptions and employs prompt-based iterative optimization instead of fine-tuning. VGRL integrates graph structure into LLM prompts via ego-graphs, verbalizes parameters for human readability, and uses an LLM-driven predictor, optimizer, and summary module with chain-of-thought prompting to ensure end-to-end interpretability. Theoretical analysis shows that such verbalized descriptions can reduce predictive uncertainty under fidelity and non-redundancy conditions, and experiments on a Cora TAG subset demonstrate performance gains and ablation-based validation of each component. Overall, VGRL offers a scalable, interpretable alternative for TAGs, with potential to broaden explainable AI in graph-based tasks while lowering computational costs by avoiding LLM fine-tuning.

Abstract

Representation learning on text-attributed graphs (TAGs) has attracted significant interest due to its wide-ranging real-world applications, particularly through Graph Neural Networks (GNNs). Traditional GNN methods focus on encoding the structural information of graphs, often using shallow text embeddings for node or edge attributes. This limits the model to understand the rich semantic information in the data and its reasoning ability for complex downstream tasks, while also lacking interpretability. With the rise of large language models (LLMs), an increasing number of studies are combining them with GNNs for graph representation learning and downstream tasks. While these approaches effectively leverage the rich semantic information in TAGs datasets, their main drawback is that they are only partially interpretable, which limits their application in critical fields. In this paper, we propose a verbalized graph representation learning (VGRL) method which is fully interpretable. In contrast to traditional graph machine learning models, which are usually optimized within a continuous parameter space, VGRL constrains this parameter space to be text description which ensures complete interpretability throughout the entire process, making it easier for users to understand and trust the decisions of the model. We conduct several studies to empirically evaluate the effectiveness of VGRL and we believe these method can serve as a stepping stone in graph representation learning.

Verbalized Graph Representation Learning: A Fully Interpretable Graph Model Based on Large Language Models Throughout the Entire Process

TL;DR

This work tackles the lack of full interpretability in graph representation learning by introducing Verbalized Graph Representation Learning (VGRL), which constrains model parameters to textual descriptions and employs prompt-based iterative optimization instead of fine-tuning. VGRL integrates graph structure into LLM prompts via ego-graphs, verbalizes parameters for human readability, and uses an LLM-driven predictor, optimizer, and summary module with chain-of-thought prompting to ensure end-to-end interpretability. Theoretical analysis shows that such verbalized descriptions can reduce predictive uncertainty under fidelity and non-redundancy conditions, and experiments on a Cora TAG subset demonstrate performance gains and ablation-based validation of each component. Overall, VGRL offers a scalable, interpretable alternative for TAGs, with potential to broaden explainable AI in graph-based tasks while lowering computational costs by avoiding LLM fine-tuning.

Abstract

Representation learning on text-attributed graphs (TAGs) has attracted significant interest due to its wide-ranging real-world applications, particularly through Graph Neural Networks (GNNs). Traditional GNN methods focus on encoding the structural information of graphs, often using shallow text embeddings for node or edge attributes. This limits the model to understand the rich semantic information in the data and its reasoning ability for complex downstream tasks, while also lacking interpretability. With the rise of large language models (LLMs), an increasing number of studies are combining them with GNNs for graph representation learning and downstream tasks. While these approaches effectively leverage the rich semantic information in TAGs datasets, their main drawback is that they are only partially interpretable, which limits their application in critical fields. In this paper, we propose a verbalized graph representation learning (VGRL) method which is fully interpretable. In contrast to traditional graph machine learning models, which are usually optimized within a continuous parameter space, VGRL constrains this parameter space to be text description which ensures complete interpretability throughout the entire process, making it easier for users to understand and trust the decisions of the model. We conduct several studies to empirically evaluate the effectiveness of VGRL and we believe these method can serve as a stepping stone in graph representation learning.
Paper Structure (38 sections, 2 theorems, 14 equations, 17 figures, 4 tables)

This paper contains 38 sections, 2 theorems, 14 equations, 17 figures, 4 tables.

Key Result

Theorem 1

Given the following conditions: 1) Fidelity: $\theta$ can faithfully represent the information of $H_l$ such that $H(H_l | \theta) = \epsilon,$ with $\epsilon > 0$; 2)Non-redundancy: $H_l$ contains information not present in $X$, that is, $H(y|X, H_l) = H(y|X) - \epsilon^{'}$, with $\epsilon^{'} > \

Figures (17)

  • Figure 1: Comparison of Graph Representation Learning Methods (a) Traditional Graph Neural Networks (GNNs) rely on graph structures and initial features for embedding generation and prediction. (b) Incorporating a Language Model (LM) enhances GNNs, where a Large Language Model (LLM) provides explanations that refine the embedding process for improved predictions. (c) Our proposed Verbalized Graph Representation Learning (VGRL) framework introduces an iterative optimization process involving multiple frozen LLMs (Enhancer, Predictor, Optimizer, and Summary), emphasizing interpretability and parameter tuning through verbalized model adjustments.
  • Figure 2: An overview of iterative optimization and text prompt templates for the predictor, optimizer, and summary LLM in the node classification example
  • Figure 3: Summary+VGRL Acc-Step
  • Figure 4: Case study for one-shot wo prior Summary + VGRL: (1) The left figure shows the explanation information and prediction labels output by predictor LLM; (2) The right figure shows the optimization process of optimizer LLM for the predicted content of predictor LLM in the left figure.(3) The top-right figure shows an example of the one-hop neighbors of a predicted sample.
  • Figure :
  • ...and 12 more figures

Theorems & Definitions (3)

  • Theorem
  • Theorem
  • proof