Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction
Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu
TL;DR
The paper tackles the persistent factuality problem in LLMs by proposing inference-time knowledge graph construction that fuses internal LLM knowledge with external retrieval. It introduces a four-stage pipeline: graph initialization from the question, iterative graph expansion, external retrieval to refine and expand the graph into G*, and answer generation grounded strictly on G*. Empirical evaluation across CWQ, HotpotQA, and SimpleQA shows state-of-the-art performance under both internal and external knowledge settings, with notable recall improvements and robust gains across backbone models, particularly when external sources are integrated. The work demonstrates that structured, dynamic KG reasoning at inference can improve factual grounding, interpretability, and scalability, offering a practical path toward more reliable and explainable QA with LLMs.
Abstract
Large Language Models (LLMs) often struggle with producing factually consistent answers due to limitations in their parametric memory. Retrieval-Augmented Generation (RAG) paradigms mitigate this issue by incorporating external knowledge at inference time. However, such methods typically handle knowledge as unstructured text, which reduces retrieval accuracy, hinders compositional reasoning, and amplifies the influence of irrelevant information on the factual consistency of LLM outputs. To overcome these limitations, we propose a novel framework that dynamically constructs and expands knowledge graphs (KGs) during inference, integrating both internal knowledge extracted from LLMs and external knowledge retrieved from external sources. Our method begins by extracting a seed KG from the question via prompting, followed by iterative expansion using the LLM's internal knowledge. The KG is then selectively refined through external retrieval, enhancing factual coverage and correcting inaccuracies. We evaluate our approach on three diverse Factual QA benchmarks, demonstrating consistent gains in factual accuracy over baselines. Our findings reveal that inference-time KG construction is a promising direction for enhancing LLM factuality in a structured, interpretable, and scalable manner.
