Table of Contents
Fetching ...

TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation

Xiaocheng Zhang, Xi Wang, Yifei Lu, Jianing Wang, Zhuangzhuang Ye, Mengjiao Bao, Peng Yan, Xiaohong Su

TL;DR

TrendFact addresses the need for a bilingual (Chinese) fact-checking benchmark that evaluates not only verification and evidence retrieval but also explanation generation and hotspot perception. It introduces ECS and HCPI as explicit metrics to gauge explanation consistency and handling of high-influence claims, and proposes FactISR, a reasoning framework that couples dynamic evidence augmentation with influence-aware iterative self-reflection. Empirical results show current systems struggle on TrendFact’s challenging samples, while FactISR yields improvements in verification, explanations, and hotspot perception, even under resource constraints. The work offers a practical, interpretable path toward more transparent and robust automated fact-checking for high-stakes events with social influence.

Abstract

Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.

TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation

TL;DR

TrendFact addresses the need for a bilingual (Chinese) fact-checking benchmark that evaluates not only verification and evidence retrieval but also explanation generation and hotspot perception. It introduces ECS and HCPI as explicit metrics to gauge explanation consistency and handling of high-influence claims, and proposes FactISR, a reasoning framework that couples dynamic evidence augmentation with influence-aware iterative self-reflection. Empirical results show current systems struggle on TrendFact’s challenging samples, while FactISR yields improvements in verification, explanations, and hotspot perception, even under resource constraints. The work offers a practical, interpretable path toward more transparent and robust automated fact-checking for high-stakes events with social influence.

Abstract

Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.

Paper Structure

This paper contains 45 sections, 7 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: A fact-checking example from TrendFact that involves numerical reasoning.
  • Figure 2: The overall construction process of TrendFact includes claim collection, filtering, augmentation, evidence library construction, and a multi-stage sample review process.
  • Figure 3: Overview of FactISR. The bottom-right section shows the reasoning process of FactISR, where "flame" represents the reflection probability calculated based on influence score, which continuously decays with the number of iterative reflections. The reasoning process terminates when the maximum number of reflections is exceeded or no reflection step is sampled.
  • Figure 4: Performance comparison between RAG and FactISR under resource constraints.
  • Figure 5: Overview of the data distribution, including labels, gold evidence count, and domains.
  • ...and 8 more figures