GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
Yingjian Chen, Haoran Liu, Yinhong Liu, Jinxiang Xie, Rui Yang, Han Yuan, Yanran Fu, Peng Yuan Zhou, Qingyu Chen, James Caverlee, Irene Li
TL;DR
GraphCheck tackles factual errors in long-form LLM outputs by augmenting inputs with knowledge graphs extracted from both the claim and its grounding document. A trainable GNN encodes these graphs and produces embeddings that are projected into the LLM's space, enabling a single, end-to-end verification step with a frozen LLM and graph-informed reasoning. The approach yields a 71.1% balanced accuracy on seven benchmarks, including medical domains, and outperforms several specialized fact-checkers while matching large LLMs at a fraction of the cost. It also provides improved explainability through KG-edge attention visualizations and introduces a synthetic KG-enhanced dataset for future graph-based fact-checking research. Overall, GraphCheck offers a scalable, efficient, and interpretable path to reliable long-text fact-checking with practical implications for high-stakes domains.
Abstract
Large language models (LLMs) are widely used, but they often generate subtle factual errors, especially in long-form text. These errors are fatal in some specialized domains such as medicine. Existing fact-checking with grounding documents methods face two main challenges: (1) they struggle to understand complex multihop relations in long documents, often overlooking subtle factual errors; (2) most specialized methods rely on pairwise comparisons, requiring multiple model calls, leading to high resource and computational costs. To address these challenges, we propose GraphCheck, a fact-checking framework that uses extracted knowledge graphs to enhance text representation. Graph Neural Networks further process these graphs as a soft prompt, enabling LLMs to incorporate structured knowledge more effectively. Enhanced with graph-based reasoning, GraphCheck captures multihop reasoning chains that are often overlooked by existing methods, enabling precise and efficient fact-checking in a single inference call. Experimental results on seven benchmarks spanning both general and medical domains demonstrate up to a 7.1% overall improvement over baseline models. Notably, GraphCheck outperforms existing specialized fact-checkers and achieves comparable performance with state-of-the-art LLMs, such as DeepSeek-V3 and OpenAI-o1, with significantly fewer parameters.
