Table of Contents
Fetching ...

BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning

Mohammad Majid Akhtar, Navid Shadman Bhuiyan, Rahat Masood, Muhammad Ikram, Salil S. Kanhere

TL;DR

BotSSCL introduces a self-supervised contrastive learning framework tailored for tabular OSN data to detect sophisticated social bots. By constructing a multi-stream Twitter user representation from Tier-1 features and training a twin encoder with InfoNCE loss, it learns task-relevant embeddings that improve linear separability between bots and humans. The approach achieves state-of-the-art performance on Varol and Gilani datasets, demonstrates generalizability across datasets with LOBO evaluations, and shows robustness against adversarial feature manipulations, all while reducing labeling requirements. The work offers practical implications for OSN providers seeking scalable, generalizable, and adversarially robust bot-detection systems.

Abstract

The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.

BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning

TL;DR

BotSSCL introduces a self-supervised contrastive learning framework tailored for tabular OSN data to detect sophisticated social bots. By constructing a multi-stream Twitter user representation from Tier-1 features and training a twin encoder with InfoNCE loss, it learns task-relevant embeddings that improve linear separability between bots and humans. The approach achieves state-of-the-art performance on Varol and Gilani datasets, demonstrates generalizability across datasets with LOBO evaluations, and shows robustness against adversarial feature manipulations, all while reducing labeling requirements. The work offers practical implications for OSN providers seeking scalable, generalizable, and adversarially robust bot-detection systems.

Abstract

The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.
Paper Structure (32 sections, 9 equations, 5 figures, 15 tables)

This paper contains 32 sections, 9 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Overview of BotSSCL framework. The first and second block (from left to right) represents the feature engineering and user representation process. The third block denotes contrastive learning. In the fourth block, we evaluate the representations.
  • Figure 2: Twitter User Representation
  • Figure 3: a) Different Augmentations Types, b) One or Two-view level augmentation
  • Figure 4: Mutual information variation with different corruption rates influences bot detection.
  • Figure 5: Visualization of human and bot accounts in different training datasets, Figure Source: Yang_Varol_Hui_Menczer_2020. The last three datasets (especially Varol and Gilani) show the minimum homogeneity and separability problems between the bot and human classes.