FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections
Shaina Raza, Tahniat Khan, Veronica Chatrath, Drai Paulen-Patterson, Mizanur Rahman, Oluwanifemi Bamgbose
TL;DR
FakeWatch addresses the challenge of detecting fake news in elections by introducing a four-module framework (data collection, corpus construction, model development, evaluation) and a new 2024 US Elections dataset. It combines traditional ML and transformer-based approaches, including a RoBERTa-based FakeWatch model, and demonstrates that transformer models achieve the highest accuracy (0.94) and AUC (0.91), while classic methods offer competitive performance with lower computational cost. Data labeling leverages GPT-4 with human verification, achieving high inter-annotator agreement (Cohen’s κ = 0.79) to ensure label quality. The work provides publicly available labeled data and a trained model hub, and supplements quantitative results with qualitative analyses (LIWC, LDA, SNA) to reveal linguistic and thematic patterns in election-related misinformation, fostering reproducibility and further research.
Abstract
In today's technologically driven world, the rapid spread of fake news, particularly during critical events like elections, poses a growing threat to the integrity of information. To tackle this challenge head-on, we introduce FakeWatch, a comprehensive framework carefully designed to detect fake news. Leveraging a newly curated dataset of North American election-related news articles, we construct robust classification models. Our framework integrates a model hub comprising of both traditional machine learning (ML) techniques, and state-of-the-art Language Models (LMs) to discern fake news effectively. Our objective is to provide the research community with adaptable and precise classification models adept at identifying fake news for the elections agenda. Quantitative evaluations of fake news classifiers on our dataset reveal that, while state-of-the-art LMs exhibit a slight edge over traditional ML models, classical models remain competitive due to their balance of accuracy and computational efficiency. Additionally, qualitative analyses shed light on patterns within fake news articles. We provide our labeled data at https://huggingface.co/datasets/newsmediabias/fake_news_elections_labelled_data and model https://huggingface.co/newsmediabias/FakeWatch for reproducibility and further research.
