Large Language Models can impersonate politicians and other public figures

Steffen Herbold; Alexander Trautsch; Zlata Kikteva; Annette Hautli-Janisz

Large Language Models can impersonate politicians and other public figures

Steffen Herbold, Alexander Trautsch, Zlata Kikteva, Annette Hautli-Janisz

TL;DR

This study presents the results of a study based on a cross-section of British society that shows that LLMs are able to generate responses to debate questions that were part of a broadcast political debate programme in the UK.

Abstract

Modern AI technology like Large language models (LLMs) has the potential to pollute the public information sphere with made-up content, which poses a significant threat to the cohesion of societies at large. A wide range of research has shown that LLMs are capable of generating text of impressive quality, including persuasive political speech, text with a pre-defined style, and role-specific content. But there is a crucial gap in the literature: We lack large-scale and systematic studies of how capable LLMs are in impersonating political and societal representatives and how the general public judges these impersonations in terms of authenticity, relevance and coherence. We present the results of a study based on a cross-section of British society that shows that LLMs are able to generate responses to debate questions that were part of a broadcast political debate programme in the UK. The impersonated responses are judged to be more authentic and relevant than the original responses given by people who were impersonated. This shows two things: (1) LLMs can be made to contribute meaningfully to the public political debate and (2) there is a dire need to inform the general public of the potential harm this can have on society.

Large Language Models can impersonate politicians and other public figures

TL;DR

Abstract

Paper Structure (30 sections, 13 figures, 5 tables)

This paper contains 30 sections, 13 figures, 5 tables.

Abstract
Introduction
Results
Discussion
Methods
Author information
Ethics declarations
Data availability
Code availability
Extended data
Supplemental material

Figures (13)

Figure 1: Judgments when a debate question, the name of the speaker, and either the ChatGPT-generated or the response by the actual speaker were shown. Violins show a kernel density estimation of the probability distribution, the miniature box-plots depict the median, upper and lower quartile, and the whiskers the largest/smallest value observed within 1.5 times the interquartile range of the upper/lower quartile. The statistical markers reported are the the p-value of two-sided Wilcoxon signed rank tests, the effect size with Cohen's $d$, the sample sizes $n$, mean values $M$ and standard deviations $SD$.
Figure 2: Judgments when a debate question, the name of the speaker, and both the actual and ChatGPT-generated responses were shown side-by-side. The stacked bar chart reports the percentages of the ratings that we observed. The statistical markers reported are the the p-value of a two-sided one-sample Wilcoxon signed rank tests for a difference from zero, the effect size with Cohen's $d$, the sample sizes $n$, mean values $M$ and standard deviations $SD$.
Figure 3: Judgments when a debate question with either the response and biography from the actual speaker, the ChatGPT-generated response and the biography of the actual speaker, or the response from the actual speaker but the name and biography of a random public person were shown. Violins show a kernel density estimation of the probability distribution, the miniature box-plots depict the median, upper and lower quartile, and the whiskers the largest/smallest value observed within 1.5 times the interquartile range of the upper/lower quartile. The statistical markers reported are the the p-value of the omnibus test for differences and pair-wise Bonfferoni-Dunn correct two-sided post-hoc tests, the effect size with Cohen's $d$, the sample sizes $n$, mean values $M$ and standard deviations $SD$.
Figure 4: Judgments whether the content of the actual response and the ChatGPT-generated response are the same. The actla and impersonated response where shown side-by-side. The stacked bar chart reports the percentages of the ratings that we observed. The statistical markers reported are the the p-value of a two-sided one-sample Wilcoxon signed rank tests for a difference from zero, the effect size with Cohen's $d$, the sample sizes $n$, mean values $M$ and standard deviations $SD$.
Figure 5: Linguistic surface of actual debate responses versus impersonated debate responses. Violins show a kernel density estimation of the probability distribution, the miniature box-plots depict the median, upper and lower quartile, and the whiskers the largest/smallest value observed within 1.5 times the interquartile range of the upper/lower quartile. The statistical markers reported are the the p-value of two-sided Wilcoxon signed rank tests, the effect size with Cohen's $d$, the sample sizes $n$, mean values $M$ and standard deviations $SD$.
...and 8 more figures

Large Language Models can impersonate politicians and other public figures

TL;DR

Abstract

Large Language Models can impersonate politicians and other public figures

Authors

TL;DR

Abstract

Table of Contents

Figures (13)