Table of Contents
Fetching ...

Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments

Zhao Tong, Chunlin Gong, Yimeng Gu, Haichao Shi, Qiang Liu, Shu Wu, Xiao-Yu Zhang

TL;DR

This work addresses the vulnerability of fake news detectors to adversarial comments by introducing AdComment, a three-part framework comprising LLM-driven attacking comment generation, the CommentNews Aggregate Verifier (CNAV) for joint news-comment representation, and the InfoDirichlet Adjusting Mechanism (IDA) that adaptively allocates training focus across perceptual, cognitive, and socio-emotional attack categories. CNAV uses a parallel self-attention design to fuse news and comments, while IDA leverages a mutual-information-inspired vulnerability score and Dirichlet-based allocation to maintain diverse and robust exposure during training. Across multilingual datasets (Weibo16/20, RumourEval-19), AdComment achieves superior original accuracy and significantly enhanced robustness under diverse and targeted adversarial attacks, outperforming LLM-only and SLM-only baselines. The results demonstrate the practical potential of group-adaptive adversarial learning for resilient fake news detection in real-world social media environments.

Abstract

The spread of fake news online distorts public judgment and erodes trust in social media platforms. Although recent fake news detection (FND) models perform well in standard settings, they remain vulnerable to adversarial comments-authored by real users or by large language models (LLMs)-that subtly shift model decisions. In view of this, we first present a comprehensive evaluation of comment attacks to existing fake news detectors and then introduce a group-adaptive adversarial training strategy to improve the robustness of FND models. To be specific, our approach comprises three steps: (1) dividing adversarial comments into three psychologically grounded categories: perceptual, cognitive, and societal; (2) generating diverse, category-specific attacks via LLMs to enhance adversarial training; and (3) applying a Dirichlet-based adaptive sampling mechanism (InfoDirichlet Adjusting Mechanism) that dynamically adjusts the learning focus across different comment categories during training. Experiments on benchmark datasets show that our method maintains strong detection accuracy while substantially increasing robustness to a wide range of adversarial comment perturbations.

Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments

TL;DR

This work addresses the vulnerability of fake news detectors to adversarial comments by introducing AdComment, a three-part framework comprising LLM-driven attacking comment generation, the CommentNews Aggregate Verifier (CNAV) for joint news-comment representation, and the InfoDirichlet Adjusting Mechanism (IDA) that adaptively allocates training focus across perceptual, cognitive, and socio-emotional attack categories. CNAV uses a parallel self-attention design to fuse news and comments, while IDA leverages a mutual-information-inspired vulnerability score and Dirichlet-based allocation to maintain diverse and robust exposure during training. Across multilingual datasets (Weibo16/20, RumourEval-19), AdComment achieves superior original accuracy and significantly enhanced robustness under diverse and targeted adversarial attacks, outperforming LLM-only and SLM-only baselines. The results demonstrate the practical potential of group-adaptive adversarial learning for resilient fake news detection in real-world social media environments.

Abstract

The spread of fake news online distorts public judgment and erodes trust in social media platforms. Although recent fake news detection (FND) models perform well in standard settings, they remain vulnerable to adversarial comments-authored by real users or by large language models (LLMs)-that subtly shift model decisions. In view of this, we first present a comprehensive evaluation of comment attacks to existing fake news detectors and then introduce a group-adaptive adversarial training strategy to improve the robustness of FND models. To be specific, our approach comprises three steps: (1) dividing adversarial comments into three psychologically grounded categories: perceptual, cognitive, and societal; (2) generating diverse, category-specific attacks via LLMs to enhance adversarial training; and (3) applying a Dirichlet-based adaptive sampling mechanism (InfoDirichlet Adjusting Mechanism) that dynamically adjusts the learning focus across different comment categories during training. Experiments on benchmark datasets show that our method maintains strong detection accuracy while substantially increasing robustness to a wide range of adversarial comment perturbations.

Paper Structure

This paper contains 24 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Different categories of malicious news comment including minor language errors (perception attack moradi2021evaluatingpruthi2019combating), flawed reasoning (cognition attack yeh2024cocolofasahai2021breaking), and fear- or conspiracy-based narratives (socio-emotional attack rashkin2017truthliu2024conspemollm) can mislead readers. This demonstrates that models without adversarial training are highly susceptible to such attacks.
  • Figure 2: Overview of the AdComment framework. Step 1 (Left): Adversarial comments are generated via hierarchical prompting along perceptual, cognitive, and socio-emotional dimensions. Step 2 (Middle): The CommentNews Aggregate Verifier (CNAV) jointly trains on original and adjusted data distributions. Step 3 (Right): The InfoDirichlet Adjusting (IDA) mechanism estimates vulnerability scores and adaptively rebalances sampling via Dirichlet-based allocation to improve robustness.
  • Figure 3: Framework of the Chain-of-Thought (CoT) prompt used for generating perception-, cognition-, or societal-level attack comments based on given news and original comments.
  • Figure 4: Evaluation of different attacks with varying quantities.The y-axis represents Attack Success Rate(ASR). The x-axis is the different number of each comment
  • Figure 5: Evolution of validation performance during grouped adversarial training across different datasets. The y-axis represents the accuracy, while the x-axis denotes the number of training epochs.