CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot Detection
Sirry Chen, Shuo Feng, Songsong Liang, Chen-Chen Zong, Jing Li, Piji Li
TL;DR
The paper addresses social media bot detection by leveraging community structure within heterogeneous social graphs to improve generalization and mitigate over-smoothing. It introduces CACL, a framework that combines a Community-Aware Module for hard-sample mining with supervised graph contrastive learning, supported by adaptive graph augmentations and a modular loss ${\mathcal{L}_{CA} = \mathcal{L}_{G} + \mathcal{L}_{M}}$ implemented through ${\mathcal{L}_{G}}$ and ${\mathcal{L}_{M}}$ formulas. It employs three augmentations—Node Feature Shifting, Link Prediction, and Synonymy Substitution—within a heterogeneous GNN setup and a novel contrastive loss that pulls hard positives across communities while pushing hard negatives within communities. Experiments on Cresci-15, Twibot-20, and Twibot-22 demonstrate that CACL yields robust improvements across backbones, highlighting its potential to enhance bot detection in practice by better exploiting community structure and cross-modal information.
Abstract
Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.
