Table of Contents
Fetching ...

Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine

Sebastian Joseph, Lily Chen, Barry Wei, Michael Mackert, Iain J. Marshall, Paul Pu Liang, Ramez Kouzy, Byron C. Wallace, Junyi Jessy Li

TL;DR

This position paper challenges the construct validity of end-to-end automated medical fact-checking by exposing fundamental difficulties in linking lay claims to clinical evidence, resolving underspecifications, and achieving consensus on veracity. Through an expert-in-the-loop study using RedHOT social-media claims and retrieved RCT abstracts, the authors reveal low inter-annotator agreement and a high incidence of unverifiable claims, arguing that end-to-end classification is insufficient for real-world medical discourse. They propose a human-centered, interactive communication model that clarifies intent, guides evidence retrieval, and presents diverse expert perspectives, supported by extending evidence beyond RCTs to richer study designs. The work emphasizes practical utility for public health communication and patient education, calling for systems that engage users in dialogue and uncertainty, rather than delivering a single verdict.

Abstract

Technological progress has led to concrete advancements in tasks that were regarded as challenging, such as automatic fact-checking. Interest in adopting these systems for public health and medicine has grown due to the high-stakes nature of medical decisions and challenges in critically appraising a vast and diverse medical literature. Evidence-based medicine connects to every individual, and yet the nature of it is highly technical, rendering the medical literacy of majority users inadequate to sufficiently navigate the domain. Such problems with medical communication ripens the ground for end-to-end fact-checking agents: check a claim against current medical literature and return with an evidence-backed verdict. And yet, such systems remain largely unused. In this position paper, developed with expert input, we present the first study examining how clinical experts verify real claims from social media by synthesizing medical evidence. In searching for this upper-bound, we reveal fundamental challenges in end-to-end fact-checking when applied to medicine: Difficulties connecting claims in the wild to scientific evidence in the form of clinical trials; ambiguities in underspecified claims mixed with mismatched intentions; and inherently subjective veracity labels. We argue that fact-checking should be approached and evaluated as an interactive communication problem, rather than an end-to-end process.

Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine

TL;DR

This position paper challenges the construct validity of end-to-end automated medical fact-checking by exposing fundamental difficulties in linking lay claims to clinical evidence, resolving underspecifications, and achieving consensus on veracity. Through an expert-in-the-loop study using RedHOT social-media claims and retrieved RCT abstracts, the authors reveal low inter-annotator agreement and a high incidence of unverifiable claims, arguing that end-to-end classification is insufficient for real-world medical discourse. They propose a human-centered, interactive communication model that clarifies intent, guides evidence retrieval, and presents diverse expert perspectives, supported by extending evidence beyond RCTs to richer study designs. The work emphasizes practical utility for public health communication and patient education, calling for systems that engage users in dialogue and uncertainty, rather than delivering a single verdict.

Abstract

Technological progress has led to concrete advancements in tasks that were regarded as challenging, such as automatic fact-checking. Interest in adopting these systems for public health and medicine has grown due to the high-stakes nature of medical decisions and challenges in critically appraising a vast and diverse medical literature. Evidence-based medicine connects to every individual, and yet the nature of it is highly technical, rendering the medical literacy of majority users inadequate to sufficiently navigate the domain. Such problems with medical communication ripens the ground for end-to-end fact-checking agents: check a claim against current medical literature and return with an evidence-backed verdict. And yet, such systems remain largely unused. In this position paper, developed with expert input, we present the first study examining how clinical experts verify real claims from social media by synthesizing medical evidence. In searching for this upper-bound, we reveal fundamental challenges in end-to-end fact-checking when applied to medicine: Difficulties connecting claims in the wild to scientific evidence in the form of clinical trials; ambiguities in underspecified claims mixed with mismatched intentions; and inherently subjective veracity labels. We argue that fact-checking should be approached and evaluated as an interactive communication problem, rather than an end-to-end process.

Paper Structure

This paper contains 34 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Overview of our AI-in-the-loop expert study pipeline. Given a claim from a subreddit, we extract PIO elements and automatically retrieve evidence. The evidence and its context are presented to a medical expert, who provides a veracity judgment and grounded rationale.
  • Figure 2: This figure illustrates the communication model for fact-checking, where the system engages the patient by asking clarifying questions, filling contextual gaps, and verifying claims while addressing misconceptions.
  • Figure 3: Presentation of claims, PIO elements, and abstracts in the annotation interface.
  • Figure 4: Presentation of the tiering and synthesis annotations interface.