Large language models can consistently generate high-quality content for election disinformation operations
Angus R. Williams, Liam Burke-Moore, Ryan Sze-Yin Chan, Florence E. Enock, Federico Nanni, Tvesha Sippy, Yi-Ling Chung, Evelina Gabasova, Kobi Hackenburg, Jonathan Bright
TL;DR
The paper tackles the risk that large language models can generate high-quality election disinformation at scale, including hyperlocal content. It employs a two-part study: (i) DisElect, a benchmark dataset with $2{,}200$ malicious prompts and $50$ benign prompts across $13$ LLMs to assess compliance with disinformation tasks in UK contexts, and (ii) humanness experiments with $N=2{,}340$ participants to measure whether AI-generated content passes as human. Key contributions include the DisElect dataset and an open evaluation pipeline, empirical evidence that most modern LLMs can produce human-like disinformation content at scale (with some models achieving above-human humanness), and analysis of factors such as model age, refusal behavior, and pipeline stage on humanness and safety. The work provides a data-driven benchmark for AI safety in information operations and informs policymakers about the capabilities and risks of current and near-future LLMs in disinformation campaigns.
Abstract
Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate content for an election disinformation operation in localised UK context, containing 2,200 malicious prompts and 50 benign prompts. Using DisElect, we test 13 LLMs and find that most models broadly comply with these requests; we also find that the few models which refuse malicious prompts also refuse benign election-related prompts, and are more likely to refuse to generate content from a right-wing perspective. Secondly, we conduct a series of experiments (N=2,340) to assess the "humanness" of LLMs: the extent to which disinformation operation content generated by an LLM is able to pass as human-written. Our experiments suggest that almost all LLMs tested released since 2022 produce election disinformation operation content indiscernible by human evaluators over 50% of the time. Notably, we observe that multiple models achieve above-human levels of humanness. Taken together, these findings suggest that current LLMs can be used to generate high-quality content for election disinformation operations, even in hyperlocalised scenarios, at far lower costs than traditional methods, and offer researchers and policymakers an empirical benchmark for the measurement and evaluation of these capabilities in current and future models.
