Table of Contents
Fetching ...

Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models

Wooyoung Kim, Byungyoon Park, Wooju Kim

TL;DR

This paper addresses the challenge of encoding text-attributed graphs for large language models by introducing Learnable Graph Pooling Tokens (LGPT), a set of trainable tokens that balance fine-grained node information with global graph context. It further proposes Early Query Fusion, which integrates query context before graph embedding, yielding more query-tailored representations. Together, LGPT and Early Query Fusion improve Graph QA performance, achieving an average gain of $4.13\%$ on GraphQA without training the LLM and demonstrating robustness under LoRA-based LLM training. The approach offers a scalable alternative to node-level and single-vector graph representations, maintaining low complexity while reducing information loss in graph-to-text prompting. These findings advance practical graph reasoning with LLMs in diverse domains, including scene graphs and knowledge graphs.

Abstract

Graph-structured data plays a vital role in numerous domains, such as social networks, citation networks, commonsense reasoning graphs and knowledge graphs. While graph neural networks have been employed for graph processing, recent advancements have explored integrating large language models for graph-based tasks. In this paper, we propose a novel approach named Learnable Graph Pooling Token (LGPT), which addresses the limitations of the scalability issues in node-level projection and information loss in graph-level projection. LGPT enables flexible and efficient graph representation by introducing learnable parameters that act as tokens in large language models, balancing fine-grained and global graph information. Additionally, we investigate an Early Query Fusion technique, which fuses query context before constructing the graph representation, leading to more effective graph embeddings. Our method achieves a 4.13\% performance improvement on the GraphQA benchmark without training the large language model, demonstrating significant gains in handling complex textual-attributed graph data.

Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models

TL;DR

This paper addresses the challenge of encoding text-attributed graphs for large language models by introducing Learnable Graph Pooling Tokens (LGPT), a set of trainable tokens that balance fine-grained node information with global graph context. It further proposes Early Query Fusion, which integrates query context before graph embedding, yielding more query-tailored representations. Together, LGPT and Early Query Fusion improve Graph QA performance, achieving an average gain of on GraphQA without training the LLM and demonstrating robustness under LoRA-based LLM training. The approach offers a scalable alternative to node-level and single-vector graph representations, maintaining low complexity while reducing information loss in graph-to-text prompting. These findings advance practical graph reasoning with LLMs in diverse domains, including scene graphs and knowledge graphs.

Abstract

Graph-structured data plays a vital role in numerous domains, such as social networks, citation networks, commonsense reasoning graphs and knowledge graphs. While graph neural networks have been employed for graph processing, recent advancements have explored integrating large language models for graph-based tasks. In this paper, we propose a novel approach named Learnable Graph Pooling Token (LGPT), which addresses the limitations of the scalability issues in node-level projection and information loss in graph-level projection. LGPT enables flexible and efficient graph representation by introducing learnable parameters that act as tokens in large language models, balancing fine-grained and global graph information. Additionally, we investigate an Early Query Fusion technique, which fuses query context before constructing the graph representation, leading to more effective graph embeddings. Our method achieves a 4.13\% performance improvement on the GraphQA benchmark without training the large language model, demonstrating significant gains in handling complex textual-attributed graph data.

Paper Structure

This paper contains 20 sections, 10 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of Proposed Method. Our approach is similar to graphtokeng_retriever. Graph Token graphtoken generates node embeddings from the given graph ${\mathcal{S}}$ using a GNN encoder and applies mean pooling to deliver the graph information to the LLM. G-Retriever g_retriever follows the same process but differs in that it transforms the given graph ${\mathcal{S}}$ into a textual graph and feeds it into the LLM along with the additional information. Our approach builds on G-Retriever by incorporating LGPT and an Early Query Fusion Module (Red Box).
  • Figure 2: Inference Only Method Details. Zero-CoT zero-cot adds the prompt "Let's think step by step" utilizing the core concept of Chain of Thought cot, to enable LLMs to generate reasoning processes automatically. CoT-BAG cot-bag adapts this for graph tasks by modifying the prompt to "Let's construct a graph with the nodes and edges first". On the other hand, KAPING kaping prompted the information of the given graph as linearized triples.
  • Figure 3: The red bars represent the case where both the LLM and the prompt module were trained using LoRA, while the blue bars represent the case where only the prompt module was trained, and the gray bars represent inference only. Training the LLM using LoRA alongside the prompt module resulted in a significant performance improvement. Additionally, even when training the LLM, our approach, which combines LGPT and the Early Query Fusion Module, demonstrated superior QA performance compared to G-Retriever.
  • Figure 4: Performance Comparison of the number of LGPT The figure presents the performance comparison between Early Fusion and Late Fusion approaches, with varying numbers of Learnable Graph Pooling Tokens (LGPT). The experimental results indicate that using 8 LGPTs yielded the highest performance in both methods, reaching the maximum score for Early Fusion and Late Fusion. However, performance did not improve further when increasing the number of LGPTs to 32, suggesting that beyond a certain point, additional LGPTs do not contribute to further performance gains.