Table of Contents
Fetching ...

FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Shaina Raza, Tahniat Khan, Veronica Chatrath, Drai Paulen-Patterson, Mizanur Rahman, Oluwanifemi Bamgbose

TL;DR

FakeWatch addresses the challenge of detecting fake news in elections by introducing a four-module framework (data collection, corpus construction, model development, evaluation) and a new 2024 US Elections dataset. It combines traditional ML and transformer-based approaches, including a RoBERTa-based FakeWatch model, and demonstrates that transformer models achieve the highest accuracy (0.94) and AUC (0.91), while classic methods offer competitive performance with lower computational cost. Data labeling leverages GPT-4 with human verification, achieving high inter-annotator agreement (Cohen’s κ = 0.79) to ensure label quality. The work provides publicly available labeled data and a trained model hub, and supplements quantitative results with qualitative analyses (LIWC, LDA, SNA) to reveal linguistic and thematic patterns in election-related misinformation, fostering reproducibility and further research.

Abstract

In today's technologically driven world, the rapid spread of fake news, particularly during critical events like elections, poses a growing threat to the integrity of information. To tackle this challenge head-on, we introduce FakeWatch, a comprehensive framework carefully designed to detect fake news. Leveraging a newly curated dataset of North American election-related news articles, we construct robust classification models. Our framework integrates a model hub comprising of both traditional machine learning (ML) techniques, and state-of-the-art Language Models (LMs) to discern fake news effectively. Our objective is to provide the research community with adaptable and precise classification models adept at identifying fake news for the elections agenda. Quantitative evaluations of fake news classifiers on our dataset reveal that, while state-of-the-art LMs exhibit a slight edge over traditional ML models, classical models remain competitive due to their balance of accuracy and computational efficiency. Additionally, qualitative analyses shed light on patterns within fake news articles. We provide our labeled data at https://huggingface.co/datasets/newsmediabias/fake_news_elections_labelled_data and model https://huggingface.co/newsmediabias/FakeWatch for reproducibility and further research.

FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

TL;DR

FakeWatch addresses the challenge of detecting fake news in elections by introducing a four-module framework (data collection, corpus construction, model development, evaluation) and a new 2024 US Elections dataset. It combines traditional ML and transformer-based approaches, including a RoBERTa-based FakeWatch model, and demonstrates that transformer models achieve the highest accuracy (0.94) and AUC (0.91), while classic methods offer competitive performance with lower computational cost. Data labeling leverages GPT-4 with human verification, achieving high inter-annotator agreement (Cohen’s κ = 0.79) to ensure label quality. The work provides publicly available labeled data and a trained model hub, and supplements quantitative results with qualitative analyses (LIWC, LDA, SNA) to reveal linguistic and thematic patterns in election-related misinformation, fostering reproducibility and further research.

Abstract

In today's technologically driven world, the rapid spread of fake news, particularly during critical events like elections, poses a growing threat to the integrity of information. To tackle this challenge head-on, we introduce FakeWatch, a comprehensive framework carefully designed to detect fake news. Leveraging a newly curated dataset of North American election-related news articles, we construct robust classification models. Our framework integrates a model hub comprising of both traditional machine learning (ML) techniques, and state-of-the-art Language Models (LMs) to discern fake news effectively. Our objective is to provide the research community with adaptable and precise classification models adept at identifying fake news for the elections agenda. Quantitative evaluations of fake news classifiers on our dataset reveal that, while state-of-the-art LMs exhibit a slight edge over traditional ML models, classical models remain competitive due to their balance of accuracy and computational efficiency. Additionally, qualitative analyses shed light on patterns within fake news articles. We provide our labeled data at https://huggingface.co/datasets/newsmediabias/fake_news_elections_labelled_data and model https://huggingface.co/newsmediabias/FakeWatch for reproducibility and further research.
Paper Structure (27 sections, 5 equations, 7 figures, 5 tables)

This paper contains 27 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: FakeWatch, a framework to detect biases within textual data. It is a four-module framework, where data is first gathered from diverse sources and then constructed into a quality-focused corpus. Various ML models are trained on the data and evaluated based on different evaluation metrics.
  • Figure 2: The chosen classification methods.
  • Figure 3: Important topics extracted from the corpus. Each point represents a document, and the color of the point indicates its most dominant topic, labelled according to the legend. Similar content clusters are based on dominant topics, and different topics are positioned farther apart.
  • Figure 4: A histogram of sentiment polarity comparison between real (green) and fake (red) news.
  • Figure 5: A bar chart of the frequency of each key term in the data.
  • ...and 2 more figures