Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

Alexander Loth; Martin Kappes; Marc-Oliver Pahl

Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

Alexander Loth, Martin Kappes, Marc-Oliver Pahl

Abstract

Can humans tell whether a news article was written by a person or a large language model (LLM)? We investigate this question using JudgeGPT, a study platform that independently measures source attribution (human vs. machine) and authenticity judgment (legitimate vs. fake) on continuous scales. From 2,318 judgments collected from 1,054 participants across content generated by six LLMs, we report five findings: (1) participants cannot reliably distinguish machine-generated from human-written text (p > .05, Welch's t-test); (2) this inability holds across all tested models, including open-weight models with as few as 7B parameters; (3) self-reported domain expertise predicts judgment accuracy (r = .35, p < .001) whereas political orientation does not (r = -.10, n.s.); (4) clustering reveals distinct response strategies ("Skeptics" vs. "Believers"); and (5) accuracy degrades after approximately 30 sequential evaluations due to cognitive fatigue. The answer, in short, is no: humans cannot reliably tell. These results indicate that user-side detection is not a viable defense and motivate system-level countermeasures such as cryptographic content provenance.

Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

Abstract

Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

Abstract

Paper Structure

Table of Contents

Figures (6)