Table of Contents
Fetching ...

MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information

Tingxuan Wu, Zhaorui Ma, Yanjun Cui, Ziyi Zhou, Eric Wang

TL;DR

This work tackles the detection of social media bots by exploiting heterogeneous information streams—images, text, and user statistics—through MSM-BD, an end-to-end multimodal detector. It introduces modality-specific encoders (Visual, User Feature Textual, Tweets Textual) and a Cross-Modal Residual Cross-Attention (CMRCA) fusion module to integrate embeddings from diverse modalities. On TwiBot-22, MSM-BD achieves state-of-the-art performance (e.g., accuracy $0.8002$, $F_1=0.6105$), with ablation confirming CMRCA’s pivotal role. The approach demonstrates how explicit, role-aware cross-modal fusion can robustly identify bots in heterogeneous social-media data, offering a practical path toward more reliable detection systems.

Abstract

Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indistinguishable from human-created content. These advancements require the development of more advanced detection techniques to accurately identify these automated entities. Given the heterogeneous information landscape on social media, spanning images, texts, and user statistical features, we propose MSM-BD, a Multimodal Social Media Bot Detection approach using heterogeneous information. MSM-BD incorporates specialized encoders for heterogeneous information and introduces a cross-modal fusion technology, Cross-Modal Residual Cross-Attention (CMRCA), to enhance detection accuracy. We validate the effectiveness of our model through extensive experiments using the TwiBot-22 dataset.

MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information

TL;DR

This work tackles the detection of social media bots by exploiting heterogeneous information streams—images, text, and user statistics—through MSM-BD, an end-to-end multimodal detector. It introduces modality-specific encoders (Visual, User Feature Textual, Tweets Textual) and a Cross-Modal Residual Cross-Attention (CMRCA) fusion module to integrate embeddings from diverse modalities. On TwiBot-22, MSM-BD achieves state-of-the-art performance (e.g., accuracy , ), with ablation confirming CMRCA’s pivotal role. The approach demonstrates how explicit, role-aware cross-modal fusion can robustly identify bots in heterogeneous social-media data, offering a practical path toward more reliable detection systems.

Abstract

Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indistinguishable from human-created content. These advancements require the development of more advanced detection techniques to accurately identify these automated entities. Given the heterogeneous information landscape on social media, spanning images, texts, and user statistical features, we propose MSM-BD, a Multimodal Social Media Bot Detection approach using heterogeneous information. MSM-BD incorporates specialized encoders for heterogeneous information and introduces a cross-modal fusion technology, Cross-Modal Residual Cross-Attention (CMRCA), to enhance detection accuracy. We validate the effectiveness of our model through extensive experiments using the TwiBot-22 dataset.
Paper Structure (15 sections, 9 equations, 3 figures, 3 tables)

This paper contains 15 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Examples of negative impacts of social media bots on healthcare, such as spreading misinformation, and applications of bot detection, like public health crises detection.
  • Figure 2: The structure of MSM-BD pipeline. MSM-BD effectively utilizes profile images, user features, and tweets, processing these inputs through specialized encoders and employing CMRCA module to fuse extracted features for accurate bot detection.
  • Figure 3: Demonstration of bot detection on TwiBot-22 feng2022twibot dataset. MSM-BD classifies real users and bots correctly under complicated scenarios.