Table of Contents
Fetching ...

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

Shaghayegh Kolli, Richard Rosenbaum, Timo Cavelius, Lasse Strothe, Andrii Lata, Jana Diesner

TL;DR

Open-domain fact-checking often suffers from LLM hallucinations and limited interpretability. The authors propose a modular, three-stage hybrid pipeline that performs KG-first retrieval from DBpedia via Wikidata IDs, LM-based classification with KG evidence, and a Web Search Agent fallback (Web-RAG) for Not Enough Information cases. The approach achieves up to $0.93$ in F1 on FEVER without task-specific tuning and generalizes to FEVER 2.0 and FactKG; NEI cases often yield retrievable evidence, validated by human annotators and LLM reviewers. The work provides an open-source, modular framework that balances precision, coverage, and interpretability for real-time fact verification.

Abstract

Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet suffer from limited coverage or latency. By integrating LLMs with knowledge graphs and real-time search agents, we introduce a hybrid fact-checking approach that leverages the individual strengths of each component. Our system comprises three autonomous steps: 1) a Knowledge Graph (KG) Retrieval for rapid one-hop lookups in DBpedia, 2) an LM-based classification guided by a task-specific labeling prompt, producing outputs with internal rule-based logic, and 3) a Web Search Agent invoked only when KG coverage is insufficient. Our pipeline achieves an F1 score of 0.93 on the FEVER benchmark on the Supported/Refuted split without task-specific fine-tuning. To address Not enough information cases, we conduct a targeted reannotation study showing that our approach frequently uncovers valid evidence for claims originally labeled as Not Enough Information (NEI), as confirmed by both expert annotators and LLM reviewers. With this paper, we present a modular, opensource fact-checking pipeline with fallback strategies and generalization across datasets.

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

TL;DR

Open-domain fact-checking often suffers from LLM hallucinations and limited interpretability. The authors propose a modular, three-stage hybrid pipeline that performs KG-first retrieval from DBpedia via Wikidata IDs, LM-based classification with KG evidence, and a Web Search Agent fallback (Web-RAG) for Not Enough Information cases. The approach achieves up to in F1 on FEVER without task-specific tuning and generalizes to FEVER 2.0 and FactKG; NEI cases often yield retrievable evidence, validated by human annotators and LLM reviewers. The work provides an open-source, modular framework that balances precision, coverage, and interpretability for real-time fact verification.

Abstract

Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet suffer from limited coverage or latency. By integrating LLMs with knowledge graphs and real-time search agents, we introduce a hybrid fact-checking approach that leverages the individual strengths of each component. Our system comprises three autonomous steps: 1) a Knowledge Graph (KG) Retrieval for rapid one-hop lookups in DBpedia, 2) an LM-based classification guided by a task-specific labeling prompt, producing outputs with internal rule-based logic, and 3) a Web Search Agent invoked only when KG coverage is insufficient. Our pipeline achieves an F1 score of 0.93 on the FEVER benchmark on the Supported/Refuted split without task-specific fine-tuning. To address Not enough information cases, we conduct a targeted reannotation study showing that our approach frequently uncovers valid evidence for claims originally labeled as Not Enough Information (NEI), as confirmed by both expert annotators and LLM reviewers. With this paper, we present a modular, opensource fact-checking pipeline with fallback strategies and generalization across datasets.

Paper Structure

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Hybrid fact‑verification pipeline: a KG‑first pass links entities to Wikidata Q‑IDs, retrieves and ranks one‑hop DBpedia triples for classification; NEI outputs trigger a Web‑RAG fallback that rewrites the claim, retrieves web snippets, and re‑evaluates with the same model. Ambiguous NEI cases are validated by human annotators.
  • Figure 2: Agreement Scores Comparison. LLM--Human Cohen's $\kappa$ and Human Fleiss' $\kappa$.
  • Figure 3: Sufficiency rate differs slightly between annotators.
  • Figure 4: Confusion matrix comparing the LLM's sufficiency judgments with the human majority vote.