Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

Hye Sun Yun; Karen Y. C. Zhang; Ramez Kouzy; Iain J. Marshall; Junyi Jessy Li; Byron C. Wallace

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

Hye Sun Yun, Karen Y. C. Zhang, Ramez Kouzy, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

TL;DR

This study investigates whether Large Language Models (LLMs) are susceptible to spin in medical abstracts, a phenomenon known to bias clinician interpretation. By evaluating 22 LLMs on spin detection, interpretation of spun versus unspun trial results, and automatic simplification to plain language, the authors reveal that LLMs more readily embrace spin than humans, and can propagate it into downstream outputs. They show that targeted prompting strategies—especially joint spin detection and interpretation—significantly reduce this bias, offering practical mitigation for evidence synthesis tasks. The work highlights the need for careful prompt design and caution when deploying LLMs to summarize or simplify medical literature, particularly in oncology, where spin is prevalent and impactful.

Abstract

Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

TL;DR

Abstract

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)