Table of Contents
Fetching ...

RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection

Ashutosh Anshul, Mohammad Zia Ur Rehman, Sri Akash Kadali, Nagendra Kumar

TL;DR

RoGBot tackles bot detection under limited data scenarios by constructing a similarity-based user graph from joint semantic and metadata features and applying inductive GraphSAGE for relational reasoning. The approach combines BERT-derived tweet embeddings with lightweight user metadata, fuses them into node features, and performs graph-based propagation without relying on follower–following relationships. Empirical results on Cresci-15, Cresci-17, and PAN 2019 demonstrate state-of-the-art accuracy and robustness, with ablation confirming the importance of GraphSAGE and auxiliary features. The method offers practical value by enabling scalable bot detection in environments where explicit social links are unavailable or incomplete.

Abstract

Detecting automated accounts (bots) among genuine users on platforms like Twitter remains a challenging task due to the evolving behaviors and adaptive strategies of such accounts. While recent methods have achieved strong detection performance by combining text, metadata, and user relationship information within graph-based frameworks, many of these models heavily depend on explicit user-user relationship data. This reliance limits their applicability in scenarios where such information is unavailable. To address this limitation, we propose a novel multimodal framework that integrates detailed textual features with enriched user metadata while employing graph-based reasoning without requiring follower-following data. Our method uses transformer-based models (e.g., BERT) to extract deep semantic embeddings from tweets, which are aggregated using max pooling to form comprehensive user-level representations. These are further combined with auxiliary behavioral features and passed through a GraphSAGE model to capture both local and global patterns in user behavior. Experimental results on the Cresci-15, Cresci-17, and PAN 2019 datasets demonstrate the robustness of our approach, achieving accuracies of 99.8%, 99.1%, and 96.8%, respectively, and highlighting its effectiveness against increasingly sophisticated bot strategies.

RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection

TL;DR

RoGBot tackles bot detection under limited data scenarios by constructing a similarity-based user graph from joint semantic and metadata features and applying inductive GraphSAGE for relational reasoning. The approach combines BERT-derived tweet embeddings with lightweight user metadata, fuses them into node features, and performs graph-based propagation without relying on follower–following relationships. Empirical results on Cresci-15, Cresci-17, and PAN 2019 demonstrate state-of-the-art accuracy and robustness, with ablation confirming the importance of GraphSAGE and auxiliary features. The method offers practical value by enabling scalable bot detection in environments where explicit social links are unavailable or incomplete.

Abstract

Detecting automated accounts (bots) among genuine users on platforms like Twitter remains a challenging task due to the evolving behaviors and adaptive strategies of such accounts. While recent methods have achieved strong detection performance by combining text, metadata, and user relationship information within graph-based frameworks, many of these models heavily depend on explicit user-user relationship data. This reliance limits their applicability in scenarios where such information is unavailable. To address this limitation, we propose a novel multimodal framework that integrates detailed textual features with enriched user metadata while employing graph-based reasoning without requiring follower-following data. Our method uses transformer-based models (e.g., BERT) to extract deep semantic embeddings from tweets, which are aggregated using max pooling to form comprehensive user-level representations. These are further combined with auxiliary behavioral features and passed through a GraphSAGE model to capture both local and global patterns in user behavior. Experimental results on the Cresci-15, Cresci-17, and PAN 2019 datasets demonstrate the robustness of our approach, achieving accuracies of 99.8%, 99.1%, and 96.8%, respectively, and highlighting its effectiveness against increasingly sophisticated bot strategies.

Paper Structure

This paper contains 24 sections, 16 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Difference between tweets by (a) a bot (red) and (b) a human (green). Notice how the content of bot tweets is similar in content, length, and use of entities such as emojis and hashtags. On the contrary, human tweets are not very consistent in terms of content similarity, length, and the use of emojis and hashtags.
  • Figure 2: System Architecture: We integrate tweet-level semantic embeddings (via BERT) with user metadata to form a unified representation, which we then use to construct a user–user graph based on feature similarity. We apply GraphSAGE to refine node representations through neighborhood aggregation, and finally, we use a classification layer to predict whether a user is a bot.
  • Figure 3: Graph Construction: We construct a user graph by representing each user as a node with fused textual and metadata features. We compute pairwise cosine similarity and add edges only when the similarity exceeds a threshold, ensuring balanced connectivity and sparsity. This lets us capture meaningful semantic relations without relying on follower-following links.
  • Figure 4: t-SNE visualization of learned embeddings for bot and human users
  • Figure 5: Accuracy trend as a function of similarity threshold
  • ...and 2 more figures