RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection
Ashutosh Anshul, Mohammad Zia Ur Rehman, Sri Akash Kadali, Nagendra Kumar
TL;DR
RoGBot tackles bot detection under limited data scenarios by constructing a similarity-based user graph from joint semantic and metadata features and applying inductive GraphSAGE for relational reasoning. The approach combines BERT-derived tweet embeddings with lightweight user metadata, fuses them into node features, and performs graph-based propagation without relying on follower–following relationships. Empirical results on Cresci-15, Cresci-17, and PAN 2019 demonstrate state-of-the-art accuracy and robustness, with ablation confirming the importance of GraphSAGE and auxiliary features. The method offers practical value by enabling scalable bot detection in environments where explicit social links are unavailable or incomplete.
Abstract
Detecting automated accounts (bots) among genuine users on platforms like Twitter remains a challenging task due to the evolving behaviors and adaptive strategies of such accounts. While recent methods have achieved strong detection performance by combining text, metadata, and user relationship information within graph-based frameworks, many of these models heavily depend on explicit user-user relationship data. This reliance limits their applicability in scenarios where such information is unavailable. To address this limitation, we propose a novel multimodal framework that integrates detailed textual features with enriched user metadata while employing graph-based reasoning without requiring follower-following data. Our method uses transformer-based models (e.g., BERT) to extract deep semantic embeddings from tweets, which are aggregated using max pooling to form comprehensive user-level representations. These are further combined with auxiliary behavioral features and passed through a GraphSAGE model to capture both local and global patterns in user behavior. Experimental results on the Cresci-15, Cresci-17, and PAN 2019 datasets demonstrate the robustness of our approach, achieving accuracies of 99.8%, 99.1%, and 96.8%, respectively, and highlighting its effectiveness against increasingly sophisticated bot strategies.
