Table of Contents
Fetching ...

Assembling a Multi-Platform Ensemble Social Bot Detector with Applications to US 2020 Elections

Lynnette Hui Xian Ng, Kathleen M. Carley

TL;DR

The paper introduces BotBuster For Everyone, a multi-platform ensemble detector that combines per-field, tree-based classifiers (with Platt scaling) into a threshold-free, aggregated prediction framework capable of handling incomplete data across Twitter, Reddit, and Instagram. It demonstrates improved cross-platform performance over BotHunter and Botometer, processing partial data and enabling analysis of bot activity without requiring complete feature sets. The authors show feature importance centering on username entropy and post engagement, and apply the method to US 2020 election discourse to reveal platform-specific bot prevalence and narrative themes. The approach offers scalable, interpretable bot detection suitable for cross-platform studies, with implications for real-time moderation and sociotechnical research across diverse social media ecosystems.

Abstract

Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyses is the identification of bot accounts to segregate the class of social media users. In this work, we propose an ensemble method for bot detection, designing a multi-platform bot detection architecture to handle several problems along the bot detection pipeline: incomplete data input, minimal feature engineering, optimized classifiers for each data field, and also eliminate the need for a threshold value for classification determination. With these design decisions, we generalize our bot detection framework across Twitter, Reddit and Instagram. We also perform feature importance analysis, observing that the entropy of names and number of interactions (retweets/shares) are important factors in bot determination. Finally, we apply our multi-platform bot detector to the US 2020 presidential elections to identify and analyze bot activity across multiple social media platforms, showcasing the difference in online discourse of bots from different platforms.

Assembling a Multi-Platform Ensemble Social Bot Detector with Applications to US 2020 Elections

TL;DR

The paper introduces BotBuster For Everyone, a multi-platform ensemble detector that combines per-field, tree-based classifiers (with Platt scaling) into a threshold-free, aggregated prediction framework capable of handling incomplete data across Twitter, Reddit, and Instagram. It demonstrates improved cross-platform performance over BotHunter and Botometer, processing partial data and enabling analysis of bot activity without requiring complete feature sets. The authors show feature importance centering on username entropy and post engagement, and apply the method to US 2020 election discourse to reveal platform-specific bot prevalence and narrative themes. The approach offers scalable, interpretable bot detection suitable for cross-platform studies, with implications for real-time moderation and sociotechnical research across diverse social media ecosystems.

Abstract

Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyses is the identification of bot accounts to segregate the class of social media users. In this work, we propose an ensemble method for bot detection, designing a multi-platform bot detection architecture to handle several problems along the bot detection pipeline: incomplete data input, minimal feature engineering, optimized classifiers for each data field, and also eliminate the need for a threshold value for classification determination. With these design decisions, we generalize our bot detection framework across Twitter, Reddit and Instagram. We also perform feature importance analysis, observing that the entropy of names and number of interactions (retweets/shares) are important factors in bot determination. Finally, we apply our multi-platform bot detector to the US 2020 presidential elections to identify and analyze bot activity across multiple social media platforms, showcasing the difference in online discourse of bots from different platforms.
Paper Structure (25 sections, 4 figures, 8 tables)

This paper contains 25 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Diagram of multi-platform bot detection ensemble. The ensemble is made up of six classifiers which extract and train/test on specialized features, providing a probability of bot/human. The probabilities are then aggregated together before the account's classification is determined by the higher of the two bot/human values.
  • Figure 2: Feature Importances. The most indicative feature of bot classification is the number of retweets/shares a post receives, followed by the number of likes and the number of replies.
  • Figure 3: Proportion of user types present in the US 2020 presidential elections. There is a higher proportion of bot users in Reddit than in Twitter.
  • Figure 4: Proportion of Narrative Themes present per user type in the US 2020 presidential elections. There are different focuses of each of the user class: bots disseminate information, while human users advocate for action.