Table of Contents
Fetching ...

Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction

Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu

TL;DR

The paper tackles the persistent factuality problem in LLMs by proposing inference-time knowledge graph construction that fuses internal LLM knowledge with external retrieval. It introduces a four-stage pipeline: graph initialization from the question, iterative graph expansion, external retrieval to refine and expand the graph into G*, and answer generation grounded strictly on G*. Empirical evaluation across CWQ, HotpotQA, and SimpleQA shows state-of-the-art performance under both internal and external knowledge settings, with notable recall improvements and robust gains across backbone models, particularly when external sources are integrated. The work demonstrates that structured, dynamic KG reasoning at inference can improve factual grounding, interpretability, and scalability, offering a practical path toward more reliable and explainable QA with LLMs.

Abstract

Large Language Models (LLMs) often struggle with producing factually consistent answers due to limitations in their parametric memory. Retrieval-Augmented Generation (RAG) paradigms mitigate this issue by incorporating external knowledge at inference time. However, such methods typically handle knowledge as unstructured text, which reduces retrieval accuracy, hinders compositional reasoning, and amplifies the influence of irrelevant information on the factual consistency of LLM outputs. To overcome these limitations, we propose a novel framework that dynamically constructs and expands knowledge graphs (KGs) during inference, integrating both internal knowledge extracted from LLMs and external knowledge retrieved from external sources. Our method begins by extracting a seed KG from the question via prompting, followed by iterative expansion using the LLM's internal knowledge. The KG is then selectively refined through external retrieval, enhancing factual coverage and correcting inaccuracies. We evaluate our approach on three diverse Factual QA benchmarks, demonstrating consistent gains in factual accuracy over baselines. Our findings reveal that inference-time KG construction is a promising direction for enhancing LLM factuality in a structured, interpretable, and scalable manner.

Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction

TL;DR

The paper tackles the persistent factuality problem in LLMs by proposing inference-time knowledge graph construction that fuses internal LLM knowledge with external retrieval. It introduces a four-stage pipeline: graph initialization from the question, iterative graph expansion, external retrieval to refine and expand the graph into G*, and answer generation grounded strictly on G*. Empirical evaluation across CWQ, HotpotQA, and SimpleQA shows state-of-the-art performance under both internal and external knowledge settings, with notable recall improvements and robust gains across backbone models, particularly when external sources are integrated. The work demonstrates that structured, dynamic KG reasoning at inference can improve factual grounding, interpretability, and scalability, offering a practical path toward more reliable and explainable QA with LLMs.

Abstract

Large Language Models (LLMs) often struggle with producing factually consistent answers due to limitations in their parametric memory. Retrieval-Augmented Generation (RAG) paradigms mitigate this issue by incorporating external knowledge at inference time. However, such methods typically handle knowledge as unstructured text, which reduces retrieval accuracy, hinders compositional reasoning, and amplifies the influence of irrelevant information on the factual consistency of LLM outputs. To overcome these limitations, we propose a novel framework that dynamically constructs and expands knowledge graphs (KGs) during inference, integrating both internal knowledge extracted from LLMs and external knowledge retrieved from external sources. Our method begins by extracting a seed KG from the question via prompting, followed by iterative expansion using the LLM's internal knowledge. The KG is then selectively refined through external retrieval, enhancing factual coverage and correcting inaccuracies. We evaluate our approach on three diverse Factual QA benchmarks, demonstrating consistent gains in factual accuracy over baselines. Our findings reveal that inference-time KG construction is a promising direction for enhancing LLM factuality in a structured, interpretable, and scalable manner.

Paper Structure

This paper contains 25 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison of three methods for answering factual questions: (a) Chain-of-Thought prompting, (b) Answering with a KG constructed from the LLM’s internal knowledge, (c) Answering with a KG grounded in external knowledge. Grounding with external knowledge corrects incorrect entities and expands KG with new edges, ultimately leading to the correct answer.
  • Figure 2: Overview of our pipeline. (A) Graph initialization, in which the input question is parsed by the LLM to extract initial triplets. (B) Graph expansion iteratively explores breadth‑first relations from seed entities to build a larger KG. (C) External retrieval, search is performed (e.g., using BM25 on the content returned from wikipedia and google search) to correct or extend selected triplets, which are merged into the graph. (D) Answering on Graph, a refined KG supports factual answer generation, yielding a grounded response.
  • Figure 3: Accuracy and graph size for five models across different hop counts. Solid lines represent accuracy, while dashed lines indicate graph size.
  • Figure 4: Accuracy and recall across different hop counts. Solid lines represent accuracy, while dashed lines indicate recall.