REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance
Chuyi Kong, Gao Wei, Jing Ma, Hongzhan Lin, Yaxin Fan
TL;DR
REFLEX addresses the latency and reliability challenges of external-reliance AFC by exploiting internal backbone knowledge. It reformulates fact-checking as role-play with joint verdict and explanation training, then derives steering vectors from contrastive activation pairs to disentangle truth into substance and style. The approach achieves state-of-the-art verdict accuracy with a small set of self-refined samples and improves explanation quality, demonstrating strong transferability across backbones and pair configurations. Overall, REFLEX offers data-efficient, interpretable, and robust fact-checking with practical implications for real-time misinformation mitigation.
Abstract
The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.
