Classifying Human-Generated and AI-Generated Election Claims in Social Media
Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese, Genya Coulter
TL;DR
This work tackles election-related misinformation by introducing a taxonomy for human- and AI-generated claims and a comprehensive ElectAI dataset of 9,900 tweets. It investigates claim understanding via in-context learning using several open-source LLMs and assesses authorship attribution with traditional and transformer-based classifiers, finding that RoBERTa achieves high accuracy while LLMs show only moderate performance on attribute extraction. The study demonstrates that AI-generated election content can closely mimic human writing, making detection challenging for humans but increasingly tractable for machine learners, with notable differences across LLMs. Overall, ElectAI provides a valuable benchmark for studying misinformation in elections and motivates future work on improving claim verification and authorship detection using advanced prompting and fine-tuning techniques.
Abstract
Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.
