Table of Contents
Fetching ...

Explainable Biomedical Claim Verification with Large Language Models

Siting Liang, Daniel Sonntag

TL;DR

This paper tackles the challenge of verifying biomedical claims by presenting an explainable system that combines literature retrieval, multi-LLM NLI under a task-adaptive framework, and user-guided justification. It introduces the Chain of Evidence-Based Natural Language Inference (CoENLI) to generate evidence-based rationales in a structured sequence before classifying a claim as Support, Contradict, or Not Enough Information. SHAP saliency maps are integrated to reveal word-level contributions, increasing interpretability, while a user-in-the-loop enables iterative refinement and final narrative justification. Evaluations on NLI4CT and SciFact, along with a user study showing improved inter-annotator agreement, demonstrate enhanced transparency and reliability, with a roadmap toward integration into broader evidence synthesis for human-AI collaboration.

Abstract

Verification of biomedical claims is critical for healthcare decision-making, public health policy and scientific research. We present an interactive biomedical claim verification system by integrating LLMs, transparent model explanations, and user-guided justification. In the system, users first retrieve relevant scientific studies from a persistent medical literature corpus and explore how different LLMs perform natural language inference (NLI) within task-adaptive reasoning framework to classify each study as "Support," "Contradict," or "Not Enough Information" regarding the claim. Users can examine the model's reasoning process with additional insights provided by SHAP values that highlight word-level contributions to the final result. This combination enables a more transparent and interpretable evaluation of the model's decision-making process. A summary stage allows users to consolidate the results by selecting a result with narrative justification generated by LLMs. As a result, a consensus-based final decision is summarized for each retrieved study, aiming safe and accountable AI-assisted decision-making in biomedical contexts. We aim to integrate this explainable verification system as a component within a broader evidence synthesis framework to support human-AI collaboration.

Explainable Biomedical Claim Verification with Large Language Models

TL;DR

This paper tackles the challenge of verifying biomedical claims by presenting an explainable system that combines literature retrieval, multi-LLM NLI under a task-adaptive framework, and user-guided justification. It introduces the Chain of Evidence-Based Natural Language Inference (CoENLI) to generate evidence-based rationales in a structured sequence before classifying a claim as Support, Contradict, or Not Enough Information. SHAP saliency maps are integrated to reveal word-level contributions, increasing interpretability, while a user-in-the-loop enables iterative refinement and final narrative justification. Evaluations on NLI4CT and SciFact, along with a user study showing improved inter-annotator agreement, demonstrate enhanced transparency and reliability, with a roadmap toward integration into broader evidence synthesis for human-AI collaboration.

Abstract

Verification of biomedical claims is critical for healthcare decision-making, public health policy and scientific research. We present an interactive biomedical claim verification system by integrating LLMs, transparent model explanations, and user-guided justification. In the system, users first retrieve relevant scientific studies from a persistent medical literature corpus and explore how different LLMs perform natural language inference (NLI) within task-adaptive reasoning framework to classify each study as "Support," "Contradict," or "Not Enough Information" regarding the claim. Users can examine the model's reasoning process with additional insights provided by SHAP values that highlight word-level contributions to the final result. This combination enables a more transparent and interpretable evaluation of the model's decision-making process. A summary stage allows users to consolidate the results by selecting a result with narrative justification generated by LLMs. As a result, a consensus-based final decision is summarized for each retrieved study, aiming safe and accountable AI-assisted decision-making in biomedical contexts. We aim to integrate this explainable verification system as a component within a broader evidence synthesis framework to support human-AI collaboration.

Paper Structure

This paper contains 13 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The biomedical claim verification system comprises several interactive components for the user study.
  • Figure 2: Components D and E provide users dual-layer interpretability for a deeper understanding of verification results by combining evidence analysis with SHAP-based rationale highlighting.
  • Figure 3: Components F and G comprise the summary stage, where users actively engage by adjusting the classification results and prompting the model to generate a final justification for the final decision.
  • Figure 4: When prompting the LLMs with CoENLI framework, the process begins with Semantic Grounding and Evidence-based Evaluation steps. These steps help interpret key terms and assess each piece of claim against identified relevant data points. The highlighted words and phrases in the claim, study, and generated evaluation are intended to offer plausible insights involved in the claim verification process.
  • Figure 5: Words with positive SHAP values (highlighted in red) indicate phrases within the model-generated evaluation that significantly contribute to the Contradict classification, while words with negative SHAP values (highlighted in blue) indicate elements that reduce the likelihood of this classification.
  • ...and 1 more figures