Table of Contents
Fetching ...

SINCon: Mitigate LLM-Generated Malicious Message Injection Attack for Rumor Detection

Mingqing Zhang, Qiang Liu, Xiang Tao, Shu Wu, Liang Wang

TL;DR

This work tackles the vulnerability of MPT-based rumor detection to LLM-generated malicious message injections by balancing node influence with SINCon, a self-supervised contrastive regularization that treats high- and low-influence nodes more uniformly. SINCon defines important and unimportant nodes as the top and bottom 10% by influence, uses two targeted augmentations to mask these groups, and trains with a contrastive objective alongside the standard supervised loss. Across Twitter and Weibo, SINCon yields substantial robustness gains against HMIA-LLM attacks with only a modest drop in accuracy on clean data, demonstrating its practical potential for secure rumor detection in adversarial environments. The approach offers a scalable defense that can be integrated with existing MPT-based detectors and highlights the value of balancing node influence to mitigate localized perturbations.

Abstract

In the era of rapidly evolving large language models (LLMs), state-of-the-art rumor detection systems, particularly those based on Message Propagation Trees (MPTs), which represent a conversation tree with the post as its root and the replies as its descendants, are facing increasing threats from adversarial attacks that leverage LLMs to generate and inject malicious messages. Existing methods are based on the assumption that different nodes exhibit varying degrees of influence on predictions. They define nodes with high predictive influence as important nodes and target them for attacks. If the model treats nodes' predictive influence more uniformly, attackers will find it harder to target high predictive influence nodes. In this paper, we propose Similarizing the predictive Influence of Nodes with Contrastive Learning (SINCon), a defense mechanism that encourages the model to learn graph representations where nodes with varying importance have a more uniform influence on predictions. Extensive experiments on the Twitter and Weibo datasets demonstrate that SINCon not only preserves high classification accuracy on clean data but also significantly enhances resistance against LLM-driven message injection attacks.

SINCon: Mitigate LLM-Generated Malicious Message Injection Attack for Rumor Detection

TL;DR

This work tackles the vulnerability of MPT-based rumor detection to LLM-generated malicious message injections by balancing node influence with SINCon, a self-supervised contrastive regularization that treats high- and low-influence nodes more uniformly. SINCon defines important and unimportant nodes as the top and bottom 10% by influence, uses two targeted augmentations to mask these groups, and trains with a contrastive objective alongside the standard supervised loss. Across Twitter and Weibo, SINCon yields substantial robustness gains against HMIA-LLM attacks with only a modest drop in accuracy on clean data, demonstrating its practical potential for secure rumor detection in adversarial environments. The approach offers a scalable defense that can be integrated with existing MPT-based detectors and highlights the value of balancing node influence to mitigate localized perturbations.

Abstract

In the era of rapidly evolving large language models (LLMs), state-of-the-art rumor detection systems, particularly those based on Message Propagation Trees (MPTs), which represent a conversation tree with the post as its root and the replies as its descendants, are facing increasing threats from adversarial attacks that leverage LLMs to generate and inject malicious messages. Existing methods are based on the assumption that different nodes exhibit varying degrees of influence on predictions. They define nodes with high predictive influence as important nodes and target them for attacks. If the model treats nodes' predictive influence more uniformly, attackers will find it harder to target high predictive influence nodes. In this paper, we propose Similarizing the predictive Influence of Nodes with Contrastive Learning (SINCon), a defense mechanism that encourages the model to learn graph representations where nodes with varying importance have a more uniform influence on predictions. Extensive experiments on the Twitter and Weibo datasets demonstrate that SINCon not only preserves high classification accuracy on clean data but also significantly enhances resistance against LLM-driven message injection attacks.

Paper Structure

This paper contains 20 sections, 19 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The rumor detection model is attacked by LLM-generated malicious message injection. The message injection attack, generated by an LLM, introduces new nodes and edges, altering the topology and semantics of the MPT. This causes the rumor detection model to fail in effectively detecting the rumor.
  • Figure 2: Architecture of SINCon. Given a mini-batch $G_i \in \{G_i\}_{i=1}^B$ of MPTs, where $B = 2$: (1) we define the top 10% of nodes with the highest and lowest influence scores in an MPT as important and unimportant nodes, respectively, based on Eq. \ref{['eq:12']}. (2) To regularize the model, we introduce two data augmentation strategies: one that masks important nodes and another that masks unimportant nodes. (3) reduce the disparity in model predictions between these two augmented MPTs, maintain similarity between the augmented MPTs and the original MPT, minimize the agreement between the original MPT and other distinct MPTs within the same batch.
  • Figure 3: Sensitivity analysis of hyperparameters $\alpha_1$. Experiments conducted with both the Surrogate Model and Target Model as BiGCN.
  • Figure 4: Sensitivity analysis of hyperparameters $\alpha_2$. Experiments conducted with both the Surrogate Model and Target Model as BiGCN.