Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks

Jiaying Wu; Jiafeng Guo; Bryan Hooi

Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks

Jiaying Wu, Jiafeng Guo, Bryan Hooi

TL;DR

This work reveals that state-of-the-art text-based fake news detectors are vulnerable to LLM-empowered style attacks that restyle content to imitate trustworthy sources. It introduces SheepDog, a style-agnostic detector that emphasizes content veracity by combining LLM-assisted style reframing, a style-alignment training objective, and content-focused veracity attributions within a multi-task framework, yielding robust performance across diverse backbones. The approach achieves superior robustness on PolitiFact, GossipCop, and LUN benchmarks, while maintaining strong performance on unperturbed data. By providing attribution-based explanations and demonstrating adaptability to multiple backbones, SheepDog offers a practical and extensible path toward reliable misinformation detection in dynamically styled online content.

Abstract

It is commonly perceived that fake news and real news exhibit distinct writing styles, such as the use of sensationalist versus objective language. However, we emphasize that style-related features can also be exploited for style-based attacks. Notably, the advent of powerful Large Language Models (LLMs) has empowered malicious actors to mimic the style of trustworthy news sources, doing so swiftly, cost-effectively, and at scale. Our analysis reveals that LLM-camouflaged fake news content significantly undermines the effectiveness of state-of-the-art text-based detectors (up to 38% decrease in F1 Score), implying a severe vulnerability to stylistic variations. To address this, we introduce SheepDog, a style-robust fake news detector that prioritizes content over style in determining news veracity. SheepDog achieves this resilience through (1) LLM-empowered news reframings that inject style diversity into the training process by customizing articles to match different styles; (2) a style-agnostic training scheme that ensures consistent veracity predictions across style-diverse reframings; and (3) content-focused veracity attributions that distill content-centric guidelines from LLMs for debunking fake news, offering supplementary cues and potential intepretability that assist veracity prediction. Extensive experiments on three real-world benchmarks demonstrate SheepDog's style robustness and adaptability to various backbones.

Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks

TL;DR

Abstract

Paper Structure (31 sections, 7 equations, 3 figures, 14 tables)

This paper contains 31 sections, 7 equations, 3 figures, 14 tables.

Introduction
Related Work
Problem Definition
LLM-Empowered Style Attacks
Attack Formulation
Style-Related Detector Vulnerability
Proposed Approach
LLM-Empowered News Reframing
Style-Agnostic Training
Content-Focused Veracity Attributions
Final Objective Function of SheepDog
Experiments
Experimental Setup
Datasets
Baselines
...and 16 more sections

Figures (3)

Figure 1: A motivating example of LLM-empowered style attacks on text-based fake news detectors, where fake news is camouflaged with the style of reliable news publishers.
Figure 2: Overview of the proposed SheepDog framework for style-agnostic fake news detection.
Figure 3: Across the original fake news article and its LLM-camouflaged counterparts, SheepDog maintains consistency and accuracy in both its veracity prediction and the top-predicted veracity attribution for debunking fake news.

Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks

TL;DR

Abstract

Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)