Table of Contents
Fetching ...

Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection

Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, Peng Qi

TL;DR

This work examines whether large language models (LLMs) can effectively detect fake news and finds that, while LLMs like GPT-3.5 provide informative rationales, they generally underperform fine-tuned small LMs (SLMs) for veracity judgments. To capitalize on LLM strengths without replacing SLMs, the authors introduce the Adaptive Rationale Guidance (ARG) network, which enables SLMs to selectively incorporate LLM-derived rationales through a news-rationale interaction and rationale-usefulness mechanism; they also derive a cost-sensitive distillation variant ARG-D. Across two real-world datasets, ARG and ARG-D outperform baseline methods, highlighting the value of integrating multi-perspective rationales while maintaining practical costs. The findings emphasize that LLMs can serve as valuable advisors for SLMs in fake news detection and propose a scalable framework for leveraging LLM insights in cost-aware deployments.

Abstract

Detecting fake news requires both a delicate sense of diverse clues and a profound understanding of the real-world background, which remains challenging for detectors based on small language models (SLMs) due to their knowledge and capability limitations. Recent advances in large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with fake news detection remains underexplored. In this paper, we investigate the potential of LLMs in fake news detection. First, we conduct an empirical study and find that a sophisticated LLM such as GPT 3.5 could generally expose fake news and provide desirable multi-perspective rationales but still underperforms the basic SLM, fine-tuned BERT. Our subsequent analysis attributes such a gap to the LLM's inability to select and integrate rationales properly to conclude. Based on these findings, we propose that current LLMs may not substitute fine-tuned SLMs in fake news detection but can be a good advisor for SLMs by providing multi-perspective instructive rationales. To instantiate this proposal, we design an adaptive rationale guidance network for fake news detection (ARG), in which SLMs selectively acquire insights on news analysis from the LLMs' rationales. We further derive a rationale-free version of ARG by distillation, namely ARG-D, which services cost-sensitive scenarios without querying LLMs. Experiments on two real-world datasets demonstrate that ARG and ARG-D outperform three types of baseline methods, including SLM-based, LLM-based, and combinations of small and large language models.

Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection

TL;DR

This work examines whether large language models (LLMs) can effectively detect fake news and finds that, while LLMs like GPT-3.5 provide informative rationales, they generally underperform fine-tuned small LMs (SLMs) for veracity judgments. To capitalize on LLM strengths without replacing SLMs, the authors introduce the Adaptive Rationale Guidance (ARG) network, which enables SLMs to selectively incorporate LLM-derived rationales through a news-rationale interaction and rationale-usefulness mechanism; they also derive a cost-sensitive distillation variant ARG-D. Across two real-world datasets, ARG and ARG-D outperform baseline methods, highlighting the value of integrating multi-perspective rationales while maintaining practical costs. The findings emphasize that LLMs can serve as valuable advisors for SLMs in fake news detection and propose a scalable framework for leveraging LLM insights in cost-aware deployments.

Abstract

Detecting fake news requires both a delicate sense of diverse clues and a profound understanding of the real-world background, which remains challenging for detectors based on small language models (SLMs) due to their knowledge and capability limitations. Recent advances in large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with fake news detection remains underexplored. In this paper, we investigate the potential of LLMs in fake news detection. First, we conduct an empirical study and find that a sophisticated LLM such as GPT 3.5 could generally expose fake news and provide desirable multi-perspective rationales but still underperforms the basic SLM, fine-tuned BERT. Our subsequent analysis attributes such a gap to the LLM's inability to select and integrate rationales properly to conclude. Based on these findings, we propose that current LLMs may not substitute fine-tuned SLMs in fake news detection but can be a good advisor for SLMs by providing multi-perspective instructive rationales. To instantiate this proposal, we design an adaptive rationale guidance network for fake news detection (ARG), in which SLMs selectively acquire insights on news analysis from the LLMs' rationales. We further derive a rationale-free version of ARG by distillation, namely ARG-D, which services cost-sensitive scenarios without querying LLMs. Experiments on two real-world datasets demonstrate that ARG and ARG-D outperform three types of baseline methods, including SLM-based, LLM-based, and combinations of small and large language models.
Paper Structure (31 sections, 12 equations, 5 figures, 9 tables)

This paper contains 31 sections, 12 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Illustration of the role of large language models (LLMs) in fake news detection. In this case, (a) the LLM fails to output correct judgment of news veracity but (b) helps the small language model (SLM) judge correctly by providing informative rationales.
  • Figure 2: Illustration of prompting approaches for LLMs.
  • Figure 3: Overall architecture of our proposed adaptive rationale guidance (ARG) network and its rationale-free version ARG-D. In the ARG, the news item and LLM rationales are (a) respectively encoded into $\mathrm{\mathbf{X}}$ and $\mathrm{\mathbf{R_{*}}} (* \in \{t,c\})$. Then the small and large LMs collaborate with each other via news-rationale feature interaction, LLM judgment prediction, and rationale usefulness evaluation. The obtained interactive features $\mathrm{\mathbf{f^\prime_{*{\rightarrow}x}}}$$(* \in \{t,c\})$. These features are finally aggregated with attentively pooled news feature $\mathrm{\mathbf{x}}$ for the final judgment. In the ARG-D, the news encoder and the attention module are preserved and the output of the rationale-aware feature simulator is supervised by the aggregated feature $\mathrm{\mathbf{f_{cls}}}$ for knowledge distillation.
  • Figure 4: Statistics of additional correctly judged samples of (a) ARG and (b) ARG-D over the BERT baseline. $\mathrm{right}(\cdot)$ denotes samples correctly judged by the method $(\cdot)$. TD/CS: Textual description/commonsense perspective.
  • Figure 5: Performance as the shifting threshold changes.