Table of Contents
Fetching ...

CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot Detection

Sirry Chen, Shuo Feng, Songsong Liang, Chen-Chen Zong, Jing Li, Piji Li

TL;DR

The paper addresses social media bot detection by leveraging community structure within heterogeneous social graphs to improve generalization and mitigate over-smoothing. It introduces CACL, a framework that combines a Community-Aware Module for hard-sample mining with supervised graph contrastive learning, supported by adaptive graph augmentations and a modular loss ${\mathcal{L}_{CA} = \mathcal{L}_{G} + \mathcal{L}_{M}}$ implemented through ${\mathcal{L}_{G}}$ and ${\mathcal{L}_{M}}$ formulas. It employs three augmentations—Node Feature Shifting, Link Prediction, and Synonymy Substitution—within a heterogeneous GNN setup and a novel contrastive loss that pulls hard positives across communities while pushing hard negatives within communities. Experiments on Cresci-15, Twibot-20, and Twibot-22 demonstrate that CACL yields robust improvements across backbones, highlighting its potential to enhance bot detection in practice by better exploiting community structure and cross-modal information.

Abstract

Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.

CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot Detection

TL;DR

The paper addresses social media bot detection by leveraging community structure within heterogeneous social graphs to improve generalization and mitigate over-smoothing. It introduces CACL, a framework that combines a Community-Aware Module for hard-sample mining with supervised graph contrastive learning, supported by adaptive graph augmentations and a modular loss implemented through and formulas. It employs three augmentations—Node Feature Shifting, Link Prediction, and Synonymy Substitution—within a heterogeneous GNN setup and a novel contrastive loss that pulls hard positives across communities while pushing hard negatives within communities. Experiments on Cresci-15, Twibot-20, and Twibot-22 demonstrate that CACL yields robust improvements across backbones, highlighting its potential to enhance bot detection in practice by better exploiting community structure and cross-modal information.

Abstract

Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.
Paper Structure (26 sections, 11 equations, 4 figures, 3 tables)

This paper contains 26 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of considering diff-class nodes within one community as hard negative samples while same-class nodes between communities as hard positive samples. Based on these, graph contrastive learning is employed to push hard negative samples away and pull hard positive samples closer on the hypersphere.
  • Figure 2: Overview of our proposed framework CACL. In graph level, we employ community-aware module to split social networks into communities and mine hard negative and positive samples, creating a subgraph pool. From this pool, we constantly select two matched subgraphs. Then, in node level, we utilize three graph augmentation methods to generate augmented graph and introduce graph contrastive learning to handle hard samples.
  • Figure 3: Contrast of community entropy change trends between static and dynamic community-aware module. Static means freezing community-aware module and thus cannot dynamically mine hard samples during training process. Average entropy is a metric for measuring the imbalance of categories.
  • Figure 4: Cosine similarity change trends. Positive and negative represent cosine similarities between positive samples and between negative samples, respectively. Within and between represent cosine similarities within one community and between different communities, respectively.