Table of Contents
Fetching ...

LLMs Among Us: Generative AI Participating in Digital Discourse

Kristina Radivojevic, Nicholas Clark, Paul Brenner

TL;DR

LLMs Among Us introduces a Mastodon-based framework to study deception in digital discourse by deploying $30$ bot accounts across $10$ personas powered by $GPT-4$, $Llama 2 Chat$, and $Claude 2$ over $3$ rounds. Humans correctly identify bots only about $42\%$ of the time, with no significant difference across models, while persona design strongly shapes perception (e.g., Persona 8 more bot-like; Personas 3 and 6 harder to detect). The study uses prompt chaining to construct 30 bot accounts and conducts a three-round survey to assess detection, reporting $F1$-scores of $34.09\%$, $36.47\%$, and $31.28\%$ for the three models, respectively. These findings highlight the dual risks of privacy and manipulation in AI-enabled digital discourse and motivate future extensions such as memory, more nuanced personas, and open-source deployment for safeguarding online ecosystems.

Abstract

The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of many social media platforms. While this can bring promising opportunities, it also raises many threats, such as biases and privacy concerns, and may contribute to the spread of propaganda by malicious actors. We developed the "LLMs Among Us" experimental framework on top of the Mastodon social media platform for bot and human participants to communicate without knowing the ratio or nature of bot and human participants. We built 10 personas with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection. We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans. We also found that the choice of persona had substantially more impact on human perception than the choice of mainstream LLMs.

LLMs Among Us: Generative AI Participating in Digital Discourse

TL;DR

LLMs Among Us introduces a Mastodon-based framework to study deception in digital discourse by deploying bot accounts across personas powered by , , and over rounds. Humans correctly identify bots only about of the time, with no significant difference across models, while persona design strongly shapes perception (e.g., Persona 8 more bot-like; Personas 3 and 6 harder to detect). The study uses prompt chaining to construct 30 bot accounts and conducts a three-round survey to assess detection, reporting -scores of , , and for the three models, respectively. These findings highlight the dual risks of privacy and manipulation in AI-enabled digital discourse and motivate future extensions such as memory, more nuanced personas, and open-source deployment for safeguarding online ecosystems.

Abstract

The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of many social media platforms. While this can bring promising opportunities, it also raises many threats, such as biases and privacy concerns, and may contribute to the spread of propaganda by malicious actors. We developed the "LLMs Among Us" experimental framework on top of the Mastodon social media platform for bot and human participants to communicate without knowing the ratio or nature of bot and human participants. We built 10 personas with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection. We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans. We also found that the choice of persona had substantially more impact on human perception than the choice of mainstream LLMs.
Paper Structure (8 sections, 4 figures, 2 tables)

This paper contains 8 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of experimental framework where personified LLM bots participate in social discourse with humans.
  • Figure 2: Confusion Matrix of Predicted and Actual Bot Accounts. 0 = Human, 1 = Bot
  • Figure 3: F1 Score for LLMs relative to personas. A higher score indicates a greater likelihood of being identified as a bot.
  • Figure 4: Confusion Matrices for Persona 3, Persona 6, and Persona 8.