Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

Shaina Raza; Drai Paulen-Patterson; Chen Ding

Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

Shaina Raza, Drai Paulen-Patterson, Chen Ding

TL;DR

This paper tackles fake news detection by directly comparing two major model families: BERT-like encoder-only models and autoregressive LLMs. It introduces a 10k-news dataset labeled with GPT-4 Turbo and validated by humans, plus a comparison against a distant-label baseline, using GPT-labels for training and evaluation. The study shows that BERT-like models typically achieve higher classification accuracy, whereas LLMs exhibit superior robustness to text perturbations; instruction-tuned LLMs with majority voting further improve label reliability. A key practical takeaway is that AI-assisted annotation with human oversight, combined with careful model selection, yields strong fake news detectors while underscoring the complementary strengths of discriminative classifiers and robust generative models.

Abstract

Fake news poses a significant threat to public opinion and social stability in modern society. This study presents a comparative evaluation of BERT-like encoder-only models and autoregressive decoder-only large language models (LLMs) for fake news detection. We introduce a dataset of news articles labeled with GPT-4 assistance (an AI-labeling method) and verified by human experts to ensure reliability. Both BERT-like encoder-only models and LLMs were fine-tuned on this dataset. Additionally, we developed an instruction-tuned LLM approach with majority voting during inference for label generation. Our analysis reveals that BERT-like models generally outperform LLMs in classification tasks, while LLMs demonstrate superior robustness against text perturbations. Compared to weak labels (distant supervision) data, the results show that AI labels with human supervision achieve better classification results. This study highlights the effectiveness of combining AI-based annotation with human oversight and demonstrates the performance of different families of machine learning models for fake news detection

Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

TL;DR

Abstract

Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)