Explainable Biomedical Claim Verification with Large Language Models
Siting Liang, Daniel Sonntag
TL;DR
This paper tackles the challenge of verifying biomedical claims by presenting an explainable system that combines literature retrieval, multi-LLM NLI under a task-adaptive framework, and user-guided justification. It introduces the Chain of Evidence-Based Natural Language Inference (CoENLI) to generate evidence-based rationales in a structured sequence before classifying a claim as Support, Contradict, or Not Enough Information. SHAP saliency maps are integrated to reveal word-level contributions, increasing interpretability, while a user-in-the-loop enables iterative refinement and final narrative justification. Evaluations on NLI4CT and SciFact, along with a user study showing improved inter-annotator agreement, demonstrate enhanced transparency and reliability, with a roadmap toward integration into broader evidence synthesis for human-AI collaboration.
Abstract
Verification of biomedical claims is critical for healthcare decision-making, public health policy and scientific research. We present an interactive biomedical claim verification system by integrating LLMs, transparent model explanations, and user-guided justification. In the system, users first retrieve relevant scientific studies from a persistent medical literature corpus and explore how different LLMs perform natural language inference (NLI) within task-adaptive reasoning framework to classify each study as "Support," "Contradict," or "Not Enough Information" regarding the claim. Users can examine the model's reasoning process with additional insights provided by SHAP values that highlight word-level contributions to the final result. This combination enables a more transparent and interpretable evaluation of the model's decision-making process. A summary stage allows users to consolidate the results by selecting a result with narrative justification generated by LLMs. As a result, a consensus-based final decision is summarized for each retrieved study, aiming safe and accountable AI-assisted decision-making in biomedical contexts. We aim to integrate this explainable verification system as a component within a broader evidence synthesis framework to support human-AI collaboration.
