Table of Contents
Fetching ...

Classifying Human-Generated and AI-Generated Election Claims in Social Media

Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese, Genya Coulter

TL;DR

This work tackles election-related misinformation by introducing a taxonomy for human- and AI-generated claims and a comprehensive ElectAI dataset of 9,900 tweets. It investigates claim understanding via in-context learning using several open-source LLMs and assesses authorship attribution with traditional and transformer-based classifiers, finding that RoBERTa achieves high accuracy while LLMs show only moderate performance on attribute extraction. The study demonstrates that AI-generated election content can closely mimic human writing, making detection challenging for humans but increasingly tractable for machine learners, with notable differences across LLMs. Overall, ElectAI provides a valuable benchmark for studying misinformation in elections and motivates future work on improving claim verification and authorship detection using advanced prompting and fine-tuning techniques.

Abstract

Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.

Classifying Human-Generated and AI-Generated Election Claims in Social Media

TL;DR

This work tackles election-related misinformation by introducing a taxonomy for human- and AI-generated claims and a comprehensive ElectAI dataset of 9,900 tweets. It investigates claim understanding via in-context learning using several open-source LLMs and assesses authorship attribution with traditional and transformer-based classifiers, finding that RoBERTa achieves high accuracy while LLMs show only moderate performance on attribute extraction. The study demonstrates that AI-generated election content can closely mimic human writing, making detection challenging for humans but increasingly tractable for machine learners, with notable differences across LLMs. Overall, ElectAI provides a valuable benchmark for studying misinformation in elections and motivates future work on improving claim verification and authorship detection using advanced prompting and fine-tuning techniques.

Abstract

Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.
Paper Structure (28 sections, 3 figures, 7 tables)

This paper contains 28 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The election claim taxonomy.
  • Figure 2: Statistics of the annotations. These statistics are based on the two rounds of annotation.
  • Figure 3: The clusters of the embeddings used to train the machine learning models for authorship attribution task