Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine
Sebastian Joseph, Lily Chen, Barry Wei, Michael Mackert, Iain J. Marshall, Paul Pu Liang, Ramez Kouzy, Byron C. Wallace, Junyi Jessy Li
TL;DR
This position paper challenges the construct validity of end-to-end automated medical fact-checking by exposing fundamental difficulties in linking lay claims to clinical evidence, resolving underspecifications, and achieving consensus on veracity. Through an expert-in-the-loop study using RedHOT social-media claims and retrieved RCT abstracts, the authors reveal low inter-annotator agreement and a high incidence of unverifiable claims, arguing that end-to-end classification is insufficient for real-world medical discourse. They propose a human-centered, interactive communication model that clarifies intent, guides evidence retrieval, and presents diverse expert perspectives, supported by extending evidence beyond RCTs to richer study designs. The work emphasizes practical utility for public health communication and patient education, calling for systems that engage users in dialogue and uncertainty, rather than delivering a single verdict.
Abstract
Technological progress has led to concrete advancements in tasks that were regarded as challenging, such as automatic fact-checking. Interest in adopting these systems for public health and medicine has grown due to the high-stakes nature of medical decisions and challenges in critically appraising a vast and diverse medical literature. Evidence-based medicine connects to every individual, and yet the nature of it is highly technical, rendering the medical literacy of majority users inadequate to sufficiently navigate the domain. Such problems with medical communication ripens the ground for end-to-end fact-checking agents: check a claim against current medical literature and return with an evidence-backed verdict. And yet, such systems remain largely unused. In this position paper, developed with expert input, we present the first study examining how clinical experts verify real claims from social media by synthesizing medical evidence. In searching for this upper-bound, we reveal fundamental challenges in end-to-end fact-checking when applied to medicine: Difficulties connecting claims in the wild to scientific evidence in the form of clinical trials; ambiguities in underspecified claims mixed with mismatched intentions; and inherently subjective veracity labels. We argue that fact-checking should be approached and evaluated as an interactive communication problem, rather than an end-to-end process.
