TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation
Xiaocheng Zhang, Xi Wang, Yifei Lu, Jianing Wang, Zhuangzhuang Ye, Mengjiao Bao, Peng Yan, Xiaohong Su
TL;DR
TrendFact addresses the need for a bilingual (Chinese) fact-checking benchmark that evaluates not only verification and evidence retrieval but also explanation generation and hotspot perception. It introduces ECS and HCPI as explicit metrics to gauge explanation consistency and handling of high-influence claims, and proposes FactISR, a reasoning framework that couples dynamic evidence augmentation with influence-aware iterative self-reflection. Empirical results show current systems struggle on TrendFact’s challenging samples, while FactISR yields improvements in verification, explanations, and hotspot perception, even under resource constraints. The work offers a practical, interpretable path toward more transparent and robust automated fact-checking for high-stakes events with social influence.
Abstract
Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.
