Table of Contents
Fetching ...

Large language models can consistently generate high-quality content for election disinformation operations

Angus R. Williams, Liam Burke-Moore, Ryan Sze-Yin Chan, Florence E. Enock, Federico Nanni, Tvesha Sippy, Yi-Ling Chung, Evelina Gabasova, Kobi Hackenburg, Jonathan Bright

TL;DR

The paper tackles the risk that large language models can generate high-quality election disinformation at scale, including hyperlocal content. It employs a two-part study: (i) DisElect, a benchmark dataset with $2{,}200$ malicious prompts and $50$ benign prompts across $13$ LLMs to assess compliance with disinformation tasks in UK contexts, and (ii) humanness experiments with $N=2{,}340$ participants to measure whether AI-generated content passes as human. Key contributions include the DisElect dataset and an open evaluation pipeline, empirical evidence that most modern LLMs can produce human-like disinformation content at scale (with some models achieving above-human humanness), and analysis of factors such as model age, refusal behavior, and pipeline stage on humanness and safety. The work provides a data-driven benchmark for AI safety in information operations and informs policymakers about the capabilities and risks of current and near-future LLMs in disinformation campaigns.

Abstract

Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate content for an election disinformation operation in localised UK context, containing 2,200 malicious prompts and 50 benign prompts. Using DisElect, we test 13 LLMs and find that most models broadly comply with these requests; we also find that the few models which refuse malicious prompts also refuse benign election-related prompts, and are more likely to refuse to generate content from a right-wing perspective. Secondly, we conduct a series of experiments (N=2,340) to assess the "humanness" of LLMs: the extent to which disinformation operation content generated by an LLM is able to pass as human-written. Our experiments suggest that almost all LLMs tested released since 2022 produce election disinformation operation content indiscernible by human evaluators over 50% of the time. Notably, we observe that multiple models achieve above-human levels of humanness. Taken together, these findings suggest that current LLMs can be used to generate high-quality content for election disinformation operations, even in hyperlocalised scenarios, at far lower costs than traditional methods, and offer researchers and policymakers an empirical benchmark for the measurement and evaluation of these capabilities in current and future models.

Large language models can consistently generate high-quality content for election disinformation operations

TL;DR

The paper tackles the risk that large language models can generate high-quality election disinformation at scale, including hyperlocal content. It employs a two-part study: (i) DisElect, a benchmark dataset with malicious prompts and benign prompts across LLMs to assess compliance with disinformation tasks in UK contexts, and (ii) humanness experiments with participants to measure whether AI-generated content passes as human. Key contributions include the DisElect dataset and an open evaluation pipeline, empirical evidence that most modern LLMs can produce human-like disinformation content at scale (with some models achieving above-human humanness), and analysis of factors such as model age, refusal behavior, and pipeline stage on humanness and safety. The work provides a data-driven benchmark for AI safety in information operations and informs policymakers about the capabilities and risks of current and near-future LLMs in disinformation campaigns.

Abstract

Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate content for an election disinformation operation in localised UK context, containing 2,200 malicious prompts and 50 benign prompts. Using DisElect, we test 13 LLMs and find that most models broadly comply with these requests; we also find that the few models which refuse malicious prompts also refuse benign election-related prompts, and are more likely to refuse to generate content from a right-wing perspective. Secondly, we conduct a series of experiments (N=2,340) to assess the "humanness" of LLMs: the extent to which disinformation operation content generated by an LLM is able to pass as human-written. Our experiments suggest that almost all LLMs tested released since 2022 produce election disinformation operation content indiscernible by human evaluators over 50% of the time. Notably, we observe that multiple models achieve above-human levels of humanness. Taken together, these findings suggest that current LLMs can be used to generate high-quality content for election disinformation operations, even in hyperlocalised scenarios, at far lower costs than traditional methods, and offer researchers and policymakers an empirical benchmark for the measurement and evaluation of these capabilities in current and future models.
Paper Structure (2 sections, 9 figures, 9 tables)

This paper contains 2 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Heatmap of model response classification proportions across the 3 use cases within DisElect. Models are sorted by release date (earliest models first). $n$ refers to total responses per model within the experiment.
  • Figure 2: Refusal rates for variables shared by DisElect.VT and DisElect.MP, for 3 refusing models, plus the overall (mean) refusal rate. $n$ represents the total number of prompts corresponding with results displayed.
  • Figure 3: Refusal rates for MPs in DisElect.MP by gender and party, for 3 refusing models, plus the overall (mean) refusal rate. $n$ represents the number of MPs within a given group. Each MP is referred to in 20 individual prompts.
  • Figure 4: Box plot of the proportion of human assignments per model, aggregated across all experiments and pipelines. Models are sorted by release date. Boxes visualise the mean and confidence interval (of $+$/$-$$2$ standard errors). The dashed line shows the mean of the human proportions across the models.
  • Figure 5: Box plots of the proportion of human assignments per model, by experiment. Models are sorted by release date. Boxes visualise the mean and confidence interval (of $+$/$-$$2$ standard errors). The dashed lines show the means of the human proportions across the models.
  • ...and 4 more figures