MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information
Tingxuan Wu, Zhaorui Ma, Yanjun Cui, Ziyi Zhou, Eric Wang
TL;DR
This work tackles the detection of social media bots by exploiting heterogeneous information streams—images, text, and user statistics—through MSM-BD, an end-to-end multimodal detector. It introduces modality-specific encoders (Visual, User Feature Textual, Tweets Textual) and a Cross-Modal Residual Cross-Attention (CMRCA) fusion module to integrate embeddings from diverse modalities. On TwiBot-22, MSM-BD achieves state-of-the-art performance (e.g., accuracy $0.8002$, $F_1=0.6105$), with ablation confirming CMRCA’s pivotal role. The approach demonstrates how explicit, role-aware cross-modal fusion can robustly identify bots in heterogeneous social-media data, offering a practical path toward more reliable detection systems.
Abstract
Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indistinguishable from human-created content. These advancements require the development of more advanced detection techniques to accurately identify these automated entities. Given the heterogeneous information landscape on social media, spanning images, texts, and user statistical features, we propose MSM-BD, a Multimodal Social Media Bot Detection approach using heterogeneous information. MSM-BD incorporates specialized encoders for heterogeneous information and introduces a cross-modal fusion technology, Cross-Modal Residual Cross-Attention (CMRCA), to enhance detection accuracy. We validate the effectiveness of our model through extensive experiments using the TwiBot-22 dataset.
