SINCon: Mitigate LLM-Generated Malicious Message Injection Attack for Rumor Detection
Mingqing Zhang, Qiang Liu, Xiang Tao, Shu Wu, Liang Wang
TL;DR
This work tackles the vulnerability of MPT-based rumor detection to LLM-generated malicious message injections by balancing node influence with SINCon, a self-supervised contrastive regularization that treats high- and low-influence nodes more uniformly. SINCon defines important and unimportant nodes as the top and bottom 10% by influence, uses two targeted augmentations to mask these groups, and trains with a contrastive objective alongside the standard supervised loss. Across Twitter and Weibo, SINCon yields substantial robustness gains against HMIA-LLM attacks with only a modest drop in accuracy on clean data, demonstrating its practical potential for secure rumor detection in adversarial environments. The approach offers a scalable defense that can be integrated with existing MPT-based detectors and highlights the value of balancing node influence to mitigate localized perturbations.
Abstract
In the era of rapidly evolving large language models (LLMs), state-of-the-art rumor detection systems, particularly those based on Message Propagation Trees (MPTs), which represent a conversation tree with the post as its root and the replies as its descendants, are facing increasing threats from adversarial attacks that leverage LLMs to generate and inject malicious messages. Existing methods are based on the assumption that different nodes exhibit varying degrees of influence on predictions. They define nodes with high predictive influence as important nodes and target them for attacks. If the model treats nodes' predictive influence more uniformly, attackers will find it harder to target high predictive influence nodes. In this paper, we propose Similarizing the predictive Influence of Nodes with Contrastive Learning (SINCon), a defense mechanism that encourages the model to learn graph representations where nodes with varying importance have a more uniform influence on predictions. Extensive experiments on the Twitter and Weibo datasets demonstrate that SINCon not only preserves high classification accuracy on clean data but also significantly enhances resistance against LLM-driven message injection attacks.
