Table of Contents
Fetching ...

LGB: Language Model and Graph Neural Network-Driven Social Bot Detection

Ming Zhou, Dan Zhang, Yuandong Wang, Yangli-ao Geng, Yuxiao Dong, Jie Tang

TL;DR

A novel social bot detection framework LGB is proposed, which consists of two main components: language model (LM) and graph neural network (GNN), which consists of two main components: language model (LM) and graph neural network (GNN).

Abstract

Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion, seriously endangering social security, making their detection a critical concern. Recently, graph-based bot detection methods have achieved state-of-the-art (SOTA) performance. However, our research finds many isolated and poorly linked nodes in social networks, as shown in Fig.1, which graph-based methods cannot effectively detect. To address this problem, our research focuses on effectively utilizing node semantics and network structure to jointly detect sparsely linked nodes. Given the excellent performance of language models (LMs) in natural language understanding (NLU), we propose a novel social bot detection framework LGB, which consists of two main components: language model (LM) and graph neural network (GNN). Specifically, the social account information is first extracted into unified user textual sequences, which is then used to perform supervised fine-tuning (SFT) of the language model to improve its ability to understand social account semantics. Next, the semantically enriched node representation is fed into the pre-trained GNN to further enhance the node representation by aggregating information from neighbors. Finally, LGB fuses the information from both modalities to improve the detection performance of sparsely linked nodes. Extensive experiments on two real-world datasets demonstrate that LGB consistently outperforms state-of-the-art baseline models by up to 10.95%. LGB is already online: https://botdetection.aminer.cn/robotmain.

LGB: Language Model and Graph Neural Network-Driven Social Bot Detection

TL;DR

A novel social bot detection framework LGB is proposed, which consists of two main components: language model (LM) and graph neural network (GNN), which consists of two main components: language model (LM) and graph neural network (GNN).

Abstract

Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion, seriously endangering social security, making their detection a critical concern. Recently, graph-based bot detection methods have achieved state-of-the-art (SOTA) performance. However, our research finds many isolated and poorly linked nodes in social networks, as shown in Fig.1, which graph-based methods cannot effectively detect. To address this problem, our research focuses on effectively utilizing node semantics and network structure to jointly detect sparsely linked nodes. Given the excellent performance of language models (LMs) in natural language understanding (NLU), we propose a novel social bot detection framework LGB, which consists of two main components: language model (LM) and graph neural network (GNN). Specifically, the social account information is first extracted into unified user textual sequences, which is then used to perform supervised fine-tuning (SFT) of the language model to improve its ability to understand social account semantics. Next, the semantically enriched node representation is fed into the pre-trained GNN to further enhance the node representation by aggregating information from neighbors. Finally, LGB fuses the information from both modalities to improve the detection performance of sparsely linked nodes. Extensive experiments on two real-world datasets demonstrate that LGB consistently outperforms state-of-the-art baseline models by up to 10.95%. LGB is already online: https://botdetection.aminer.cn/robotmain.
Paper Structure (24 sections, 12 equations, 9 figures, 3 tables)

This paper contains 24 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We conduct a Pareto analysis of the distribution of social relationships on TwiBot-22 feng2022twibot, a real-world social network dataset (left), and find that there are a large number of isolated and poorly linked nodes in the social network (right). Specifically, isolated nodes account for as high as $30.62\%$ of all nodes, nodes with only one neighbor make up about $24.71\%$, but nodes with more than ten neighbors constitute only $8.2\%$.
  • Figure 2: LM vs. GNN for nodes with different numbers of neighbors. X-axis: the number of neighbors; Y-axis: the detection accuracy of models.
  • Figure 3: Social relationship structure analysis. The Y-axis represents bot probability, and the X-axis indicates the number of connected components (NumCC) of (a) human neighbors, (b) bot neighbors, or (c) total neighbors.
  • Figure 4: The overall system architecture of LGB primarily comprises two subsystems: offline training and online detection. These subsystems work collaboratively through data interaction.
  • Figure 5: The unified user textual sequence.
  • ...and 4 more figures