LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification
Siyuan Wang, James R. Foulds, Md Osman Gani, Shimei Pan
TL;DR
CIBER tackles LLM hallucination in scientific claim verification by augmenting RAG with a fully unsupervised, model-agnostic framework that analyzes responses across multiple interrogation probes. It introduces MAI to generate diverse probes, RR to map outputs to {S,R,N}, and V&C to fuse evidence via WP, WIG, WBU, and a Meta Verdict, enabling robust verdicts without internal LLM access. Empirical results show CIBER substantially improves over traditional RAG, especially with advanced LLMs like GPT-3.5 and GPT-4, across synthetic and real climate and autism datasets, while highlighting trade-offs in probe design and evidence fusion. The work provides ground-truth datasets and practical insights for deploying LLM-based scientific claim verification in diverse domains with improved reliability and generalization.
Abstract
In this paper, we introduce CIBER (Claim Investigation Based on Evidence Retrieval), an extension of the Retrieval-Augmented Generation (RAG) framework designed to identify corroborating and refuting documents as evidence for scientific claim verification. CIBER addresses the inherent uncertainty in Large Language Models (LLMs) by evaluating response consistency across diverse interrogation probes. By focusing on the behavioral analysis of LLMs without requiring access to their internal information, CIBER is applicable to both white-box and black-box models. Furthermore, CIBER operates in an unsupervised manner, enabling easy generalization across various scientific domains. Comprehensive evaluations conducted using LLMs with varying levels of linguistic proficiency reveal CIBER's superior performance compared to conventional RAG approaches. These findings not only highlight the effectiveness of CIBER but also provide valuable insights for future advancements in LLM-based scientific claim verification.
