Table of Contents
Fetching ...

Large Language Models Can Better Understand Knowledge Graphs Than We Thought

Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Qiu Ji, Guilin Qi

TL;DR

The paper systematically analyzes how knowledge-graph prompts in different input formats affect large language models’ understanding and use of external knowledge. By comparing triple-to-text and text-to-triple pipelines, examining attention distributions, and testing various organization strategies, it shows that unordered linearized triples often yield better fact-intensive QA performance than fluent NL text, while attention concentrates more on triple format information. It further reveals that LLMs differ in preferred prompt strategies and that larger models are more sensitive to noisy or incomplete subgraphs. These findings offer practical guidance for designing KG-aware prompts that improve factual accuracy without retraining large models. Overall, the work clarifies the relative value of KG prompt formats and proposes concrete strategies to optimize KG-augmented LLM reasoning in real-world settings.

Abstract

When we integrate factual knowledge from knowledge graphs (KGs) into large language models (LLMs) to enhance their performance, the cost of injection through training increases with the scale of the models. Consequently, there is significant interest in developing prompt strategies that effectively incorporate KG information into LLMs. However, the community has not yet comprehensively understood how LLMs process and interpret KG information in different input formats and organizations within prompts, and researchers often rely on trial and error. To address this gap, we design extensive experiments to empirically study LLMs' comprehension of different KG prompts. At the literal level, we reveal LLMs' preferences for various input formats (from linearized triples to fluent natural language text). At the attention distribution level, we discuss the underlying mechanisms driving these preferences. We then investigate how the organization of structured knowledge impacts LLMs and evaluate LLMs' robustness in processing and utilizing KG information in practical scenarios. Our experiments show that (1) linearized triples are more effective than fluent NL text in helping LLMs understand KG information and answer fact-intensive questions; (2) Different LLMs exhibit varying preferences for different organizational formats of triples; (3) LLMs with larger scales are more susceptible to noisy, incomplete subgraphs.

Large Language Models Can Better Understand Knowledge Graphs Than We Thought

TL;DR

The paper systematically analyzes how knowledge-graph prompts in different input formats affect large language models’ understanding and use of external knowledge. By comparing triple-to-text and text-to-triple pipelines, examining attention distributions, and testing various organization strategies, it shows that unordered linearized triples often yield better fact-intensive QA performance than fluent NL text, while attention concentrates more on triple format information. It further reveals that LLMs differ in preferred prompt strategies and that larger models are more sensitive to noisy or incomplete subgraphs. These findings offer practical guidance for designing KG-aware prompts that improve factual accuracy without retraining large models. Overall, the work clarifies the relative value of KG prompt formats and proposes concrete strategies to optimize KG-augmented LLM reasoning in real-world settings.

Abstract

When we integrate factual knowledge from knowledge graphs (KGs) into large language models (LLMs) to enhance their performance, the cost of injection through training increases with the scale of the models. Consequently, there is significant interest in developing prompt strategies that effectively incorporate KG information into LLMs. However, the community has not yet comprehensively understood how LLMs process and interpret KG information in different input formats and organizations within prompts, and researchers often rely on trial and error. To address this gap, we design extensive experiments to empirically study LLMs' comprehension of different KG prompts. At the literal level, we reveal LLMs' preferences for various input formats (from linearized triples to fluent natural language text). At the attention distribution level, we discuss the underlying mechanisms driving these preferences. We then investigate how the organization of structured knowledge impacts LLMs and evaluate LLMs' robustness in processing and utilizing KG information in practical scenarios. Our experiments show that (1) linearized triples are more effective than fluent NL text in helping LLMs understand KG information and answer fact-intensive questions; (2) Different LLMs exhibit varying preferences for different organizational formats of triples; (3) LLMs with larger scales are more susceptible to noisy, incomplete subgraphs.
Paper Structure (35 sections, 4 equations, 6 figures, 5 tables)

This paper contains 35 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: KG is processed into different input formats to provide LLM with knowledge.
  • Figure 2: There are six categories of our expansion method. (1) expanded nodes = 0, depth = 0: only providing core reasoning paths; (2) expanded nodes = 0.5, depth = 1: expanding each node on each core path by one neighbouring node, with a 50% probability of deleting this expansion node; (3) expanded nodes = 1, depth = 1: expanding each node on each core path by one neighbouring node; (4) expanded nodes = 2, depth = 1: expanding each node on each core path by two neighbouring nodes; (5) expanded nodes = 1, depth = 2: starting from nodes on the core path, expanding to 2-hop neighbouring nodes, expanding one node at a time; (6) expanded nodes = 2, depth = 2: starting from nodes on the core path, expanding to 2-hop neighbouring nodes, expanding two nodes at a time (shown in this figure).
  • Figure 3: Performance of LLM in Triple-to-Text Pipeline. The x-axis represents various input formats, while the y-axis indicates different subgraph sizes. The parameter $e$ denotes expanded nodes, and $d$ represents depth.
  • Figure 4: We employ a human-annotated KG mapping with a document for generating multi-hop questions, then evaluate the performance of LLM in answering these fact-related questions with Documents and Subgraph.
  • Figure 5: We examine the average attention proportion between the predicted labels (indicated by a colon ":") and the answer words (e.g., "Median"). Whether in the Single mode (providing knowledge in one format) or the Double mode (providing knowledge in both formats simultaneously), the LLM consistently pays more attention to the answers in the linearized triple format.
  • ...and 1 more figures