Table of Contents
Fetching ...

ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM

Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui

TL;DR

ClaimPKG presents an end-to-end framework that fuses the reasoning strengths of LLMs with the structured knowledge in knowledge graphs to verify claims. It introduces three modules—Pseudo Subgraph Generation, Subgraph Retrieval, and General Reasoning—driven by a probabilistic formulation that decomposes verification through latent subgraphs and pseudo-graphs, aided by a Trie-constrained decoding mechanism to ensure KG-consistent entities. On FactKG, ClaimPKG achieves state-of-the-art accuracy (e.g., ~84.6% average with specific backbone combinations) and demonstrates strong multi-hop performance, with zero-shot generalization to HoVer and FEVEROUS, showcasing robustness across structured and unstructured settings. Interpretability is emphasized via human analyses of errors and grounded justifications, and scalability is achieved through decoupled components where KG updates require only the Entity-Trie adjustments, making ClaimPKG a practical framework for reliable and explainable misinformation verification.

Abstract

Integrating knowledge graphs (KGs) to enhance the reasoning capabilities of large language models (LLMs) is an emerging research challenge in claim verification. While KGs provide structured, semantically rich representations well-suited for reasoning, most existing verification methods rely on unstructured text corpora, limiting their ability to effectively leverage KGs. Additionally, despite possessing strong reasoning abilities, modern LLMs struggle with multi-step modular pipelines and reasoning over KGs without adaptation. To address these challenges, we propose ClaimPKG, an end-to-end framework that seamlessly integrates LLM reasoning with structured knowledge from KGs. Specifically, the main idea of ClaimPKG is to employ a lightweight, specialized LLM to represent the input claim as pseudo-subgraphs, guiding a dedicated subgraph retrieval module to identify relevant KG subgraphs. These retrieved subgraphs are then processed by a general-purpose LLM to produce the final verdict and justification. Extensive experiments on the FactKG dataset demonstrate that ClaimPKG achieves state-of-the-art performance, outperforming strong baselines in this research field by 9%-12% accuracy points across multiple categories. Furthermore, ClaimPKG exhibits zero-shot generalizability to unstructured datasets such as HoVer and FEVEROUS, effectively combining structured knowledge from KGs with LLM reasoning across various LLM backbones.

ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM

TL;DR

ClaimPKG presents an end-to-end framework that fuses the reasoning strengths of LLMs with the structured knowledge in knowledge graphs to verify claims. It introduces three modules—Pseudo Subgraph Generation, Subgraph Retrieval, and General Reasoning—driven by a probabilistic formulation that decomposes verification through latent subgraphs and pseudo-graphs, aided by a Trie-constrained decoding mechanism to ensure KG-consistent entities. On FactKG, ClaimPKG achieves state-of-the-art accuracy (e.g., ~84.6% average with specific backbone combinations) and demonstrates strong multi-hop performance, with zero-shot generalization to HoVer and FEVEROUS, showcasing robustness across structured and unstructured settings. Interpretability is emphasized via human analyses of errors and grounded justifications, and scalability is achieved through decoupled components where KG updates require only the Entity-Trie adjustments, making ClaimPKG a practical framework for reliable and explainable misinformation verification.

Abstract

Integrating knowledge graphs (KGs) to enhance the reasoning capabilities of large language models (LLMs) is an emerging research challenge in claim verification. While KGs provide structured, semantically rich representations well-suited for reasoning, most existing verification methods rely on unstructured text corpora, limiting their ability to effectively leverage KGs. Additionally, despite possessing strong reasoning abilities, modern LLMs struggle with multi-step modular pipelines and reasoning over KGs without adaptation. To address these challenges, we propose ClaimPKG, an end-to-end framework that seamlessly integrates LLM reasoning with structured knowledge from KGs. Specifically, the main idea of ClaimPKG is to employ a lightweight, specialized LLM to represent the input claim as pseudo-subgraphs, guiding a dedicated subgraph retrieval module to identify relevant KG subgraphs. These retrieved subgraphs are then processed by a general-purpose LLM to produce the final verdict and justification. Extensive experiments on the FactKG dataset demonstrate that ClaimPKG achieves state-of-the-art performance, outperforming strong baselines in this research field by 9%-12% accuracy points across multiple categories. Furthermore, ClaimPKG exhibits zero-shot generalizability to unstructured datasets such as HoVer and FEVEROUS, effectively combining structured knowledge from KGs with LLM reasoning across various LLM backbones.

Paper Structure

This paper contains 30 sections, 9 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Different claim verification paradigms: (a) Unstructured Text-based methods focusing on claim decomposition and sequential reasoning over text, (b) KG-based methods facing challenges in entity resolution and structured reasoning, and (c) ClaimPKG's unified framework with specialized modules for pseudo-subgraph generation, retrieval, and general reasoning.
  • Figure 2: Illustration of the ClaimPKG for claim verification. The framework consists of three key modules: (1) Pseudo-subgraph Generation, constructing representative subgraphs; (2) Subgraph Retrieval, selecting the most pertinent KG subgraphs; and (3) General Reasoning, integrating them for accurate and interpretable verification.
  • Figure 3: Varying Specialized LLM's training data.
  • Figure 4: Provided data of FactKG
  • Figure 5: Pseudo-Subgraph label as the output of the data annotation process.
  • ...and 3 more figures