Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time
Shaina Raza, Mizanur Rahman, Shardul Ghuge
TL;DR
The paper addresses the need for a North American-focused fake news dataset that emphasizes racial slurs and biases in political discourse ahead of the 2024 elections. It builds a 40,000-article corpus, with 4,000 items annotated using a hybrid OpenAI API and human verification approach, and publicly releases the data along with a RoBERTa-based classifier benchmark. The analysis characterizes temporal trends, outlets, keyword themes, and linguistic features, with RoBERTa outperforming DistilBERT, ALBERT, and BERT across standard classification metrics. The work highlights data-drift considerations and outlines plans to expand sources, timeframes, and incorporate newer language models to sustain relevance and impact in misinformation research and democratic resilience.
Abstract
Despite increasing awareness and research around fake news, there is still a significant need for datasets that specifically target racial slurs and biases within North American political speeches. This is particulary important in the context of upcoming North American elections. This study introduces a comprehensive dataset that illuminates these critical aspects of misinformation. To develop this fake news dataset, we scraped and built a corpus of 40,000 news articles about political discourses in North America. A portion of this dataset (4000) was then carefully annotated, using a blend of advanced language models and human verification methods. We have made both these datasets openly available to the research community and have conducted benchmarking on the annotated data to demonstrate its utility. We release the best-performing language model along with data. We encourage researchers and developers to make use of this dataset and contribute to this ongoing initiative.
