Enhancing Twitter Bot Detection via Multimodal Invariant Representations
Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang
TL;DR
This work tackles the challenge of detecting increasingly camouflaged Twitter Bots by proposing BotSAI, a multimodal framework that learns invariant representations across metadata, text, and heterogeneous network topology. It jointly trains modality-specific and invariant subspaces, applies oversampling to balance data, and fuses information through a cross-modal, multi-head attention mechanism, guided by losses that align invariants, separate specifics, and preserve modality details. Key contributions include a graph-augmented, transformer-based encoder with Local Relational Transformers, a multichannel representation strategy, and a composite loss incorporating CMD-based similarity, orthogonality constraints, and reconstruction. Empirical results on TwiBot-20 and MGTAB show BotSAI achieving state-of-the-art accuracy and F1-scores, with ablations revealing the importance of diverse social relationships and the combination of invariant and specific subspaces for robust detection.
Abstract
Detecting Twitter Bots is crucial for maintaining the integrity of online discourse, safeguarding democratic processes, and preventing the spread of malicious propaganda. However, advanced Twitter Bots today often employ sophisticated feature manipulation and account farming techniques to blend seamlessly with genuine user interactions, posing significant challenges to existing detection models. In response to these challenges, this paper proposes a novel Twitter Bot Detection framework called BotSAI. This framework enhances the consistency of multimodal user features, accurately characterizing various modalities to distinguish between real users and bots. Specifically, the architecture integrates information from users, textual content, and heterogeneous network topologies, leveraging customized encoders to obtain comprehensive user feature representations. The heterogeneous network encoder efficiently aggregates information from neighboring nodes through oversampling techniques and local relationship transformers. Subsequently, a multi-channel representation mechanism maps user representations into invariant and specific subspaces, enhancing the feature vectors. Finally, a self-attention mechanism is introduced to integrate and refine the enhanced user representations, enabling efficient information interaction. Extensive experiments demonstrate that BotSAI outperforms existing state-of-the-art methods on two major Twitter Bot Detection benchmarks, exhibiting superior performance. Additionally, systematic experiments reveal the impact of different social relationships on detection accuracy, providing novel insights for the identification of social bots.
